Prediction image generation method, image coding method, image decoding method, and prediction image generation apparatus

ABSTRACT

Provided is a prediction image generation method for generating a prediction image of a current block, the prediction image generation method including: an extraction step of extracting a plurality of first feature points each of which has a local feature quantity, the plurality of first feature points being included in a reconstructed image; a search step of searching a corresponding point from the plurality of first feature points, the corresponding point having a local feature quantity similar to a local feature quantity of a second feature point corresponding to the current block, a relationship between the corresponding point and the second feature point being expressed by information including a non-parallel translation component; and a generation step of generating the prediction image from the reconstructed image based on the relationship.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present disclosure relates to a prediction image generation method,an image coding method, an image decoding method, and a prediction imagegeneration apparatus.

2. Description of the Related Art

In order to improve coding efficiency, various studies have been maderegarding an HEVC (High Efficiency Video Coding) standard which is thelatest video coding standard (see, for example, NPL 1). This scheme isone of ITU-T (International Telecommunication Union TelecommunicationStandardization Sector) standards called H.26x and one of ISO/IEC(International Organization for Standardization/InternationalElectrotechnical Communication) standards called MPEG-x (Moving PictureExperts Group-x), and has been studied as a successor to the videocoding standard called H.264/AVC (Advanced Video Coding) or MPEG-4 AVC.

In a prediction image generation process that is used during the codingin both AVC and HEVC, only pixel information on an adjacent block isused in intra prediction, and only parallel translation is used in interprediction.

CITATION LIST Non-Patent Literature

-   NPL 1: Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T    SG16 WP3 and ISO/IEC JTC1/SC29/WG11 12th Meeting: Geneva, CH, 14-23    Jan. 2013 JCTVC-L1003_v34.doc, High Efficiency Video Coding (HEVC)    text specification draft 10 (for FDIS & Last Call)    http://phenix.it-sudparis.eu/jct/doc_end_user/documents/12_/wg11/JCTVC-L    1003-v34.zip

SUMMARY OF THE INVENTION

There is a demand for improving the coding efficiency in the predictionimage generation method, the image coding method, and the image decodingmethod.

An object of the present disclosure is to provide a motion predictionmethod, an image coding method, or an image decoding method capable ofimproving the coding efficiency.

One aspect of the present disclosure provides a prediction imagegeneration method for generating a prediction image of a target block,the prediction image generation method including: an extraction step ofextracting a plurality of first feature points each of which has a localfeature quantity, the plurality of first feature points being includedin a reconstructed image; a search step of searching a correspondingpoint from the plurality of first feature points, the correspondingpoint having a local feature quantity similar to a local featurequantity of a second feature point corresponding to the target block, arelationship between the corresponding point and the second featurepoint being expressed by information including a non-paralleltranslation component; and a generation step of generating theprediction image from the reconstructed image based on the relationship.

Note that these general or specific aspects may be implemented using asystem, a method, an integrated circuit, a computer program, or acomputer-readable recording medium such as a CD-ROM (Compact Disc-ReadOnly Memory), or using any given combination of a system, a method, anintegrated circuit, a computer program, and a computer-readablerecording medium.

The present disclosure can provide a prediction image generation method,an image coding method or an image decoding method capable of improvingthe coding efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a configuration of an imagecoding apparatus according to a first exemplary embodiment;

FIG. 2 is a flowchart illustrating an example of an image coding processaccording to the first exemplary embodiment;

FIG. 3 is a flowchart illustrating an example of a prediction blockgeneration process according to the first exemplary embodiment;

FIG. 4 is a flowchart illustrating an example of an intra predictionprocess according to the first exemplary embodiment;

FIG. 5 is a flowchart illustrating an example of a feature point useintra prediction process of the first exemplary embodiment;

FIG. 6 is a flowchart illustrating an example of an in-block mode intraprediction process of the first exemplary embodiment;

FIG. 7 is a view illustrating an example of an in-block mode predictionimage generation process of the first exemplary embodiment;

FIG. 8 is a flowchart illustrating an example of an in-block modefeature-point-related information coding process of the first exemplaryembodiment;

FIG. 9 is a flowchart illustrating an example of a surrounding blockmode intra prediction process of the first exemplary embodiment;

FIG. 10 is a view illustrating an example of a surrounding block modeprediction image generation process of the first exemplary embodiment;

FIG. 11 is a flowchart illustrating an example of a surrounding blockmode feature-point-related information coding process of the firstexemplary embodiment;

FIG. 12 is a view illustrating extraction range information of the firstexemplary embodiment;

FIG. 13 is a flowchart illustrating an example of an inter predictionprocess of the first exemplary embodiment;

FIG. 14 is a flowchart illustrating an example of a feature point useinter prediction process of the first exemplary embodiment;

FIG. 15 is a flowchart illustrating an example of the in-block modeinter prediction process of the first exemplary embodiment;

FIG. 16 is a view illustrating an example of the in-block modeprediction image generation process of the first exemplary embodiment;

FIG. 17 is a flowchart illustrating an example of the surrounding blockmode inter prediction process of the first exemplary embodiment;

FIG. 18 is a view illustrating an example of the surrounding block modeprediction image generation process of the first exemplary embodiment;

FIG. 19 is a flowchart illustrating an example of a motion informationuse inter prediction process according to a second exemplary embodiment;

FIG. 20 is a flowchart illustrating an example of a motion estimationprocess of the second exemplary embodiment;

FIG. 21 is a block diagram illustrating a configuration of an imagedecoding apparatus according to a third exemplary embodiment;

FIG. 22 is a flowchart illustrating an example of an image decodingprocess of the third exemplary embodiment;

FIG. 23 is a flowchart illustrating an example of a predictioninformation decoding process of the third exemplary embodiment;

FIG. 24 is a flowchart illustrating an example of afeature-point-related information decoding process of the thirdexemplary embodiment;

FIG. 25 is a flowchart illustrating an example of the in-block modefeature-point-related information decoding process of the thirdexemplary embodiment;

FIG. 26 is a flowchart illustrating an example of the surrounding blockmode feature-point-related information decoding process of the thirdexemplary embodiment;

FIG. 27 is a flowchart illustrating an example of a prediction blockgeneration process of the third exemplary embodiment;

FIG. 28 is a flowchart illustrating an example of an intra predictionimage generation process of the third exemplary embodiment;

FIG. 29 is a flowchart illustrating an example of a feature point useintra prediction process of the third exemplary embodiment;

FIG. 30 is a flowchart illustrating an example of the in-block modeintra prediction image generation process of the third exemplaryembodiment;

FIG. 31 is a flowchart illustrating an example of the surrounding blockmode intra prediction image generation process of the third exemplaryembodiment;

FIG. 32 is a flowchart illustrating an example of an inter predictionimage generation process of the third exemplary embodiment.

FIG. 33 is a flowchart illustrating an example of a feature point useintra prediction process of the third exemplary embodiment;

FIG. 34 is a flowchart illustrating an example of the in-block modeinter prediction image generation process of the third exemplaryembodiment;

FIG. 35 is a flowchart illustrating an example of the surrounding blockmode inter prediction image generation process of the third exemplaryembodiment;

FIG. 36 is a flowchart illustrating an example of a prediction imagegeneration process of the third exemplary embodiment;

FIG. 37 is an entire configuration diagram illustrating a content supplysystem that implements a content distribution service;

FIG. 38 is an entire configuration diagram of a digital broadcastingsystem;

FIG. 39 is a block diagram illustrating a configuration example of atelevision;

FIG. 40 is a block diagram illustrating a configuration example of aninformation playback/recording unit that reads and writes informationfrom and to a recording medium which is of an optical disk;

FIG. 41 is a view illustrating a structural example of the recordingmedium that is of the optical disk;

FIG. 42A is a view illustrating an example of a mobile phone;

FIG. 42B is a block diagram illustrating a configuration example of themobile phone;

FIG. 43 is a view illustrating a structure of multiplexed data;

FIG. 44 is a schematic diagram illustrating how to multiplex each streamin the multiplexed data;

FIG. 45 is a view illustrating how to store a video stream in a PESpacket sequence in more detailed;

FIG. 46 is a view illustrating structures of a TS (Transport Stream)packet and a source packet in the multiplexed data;

FIG. 47 is a view illustrating a data configuration of a PMT;

FIG. 48 is a view illustrating an internal configuration of multiplexeddata information;

FIG. 49 is a view illustrating an internal configuration of streamattribute information;

FIG. 50 is a view illustrating steps of identifying video data;

FIG. 51 is a block diagram illustrating a configuration example of anintegrated circuit that implements a video coding method and a videodecoding method according to each of the exemplary embodiments;

FIG. 52 is a view illustrating a configuration that switches a drivingfrequency;

FIG. 53 is a view illustrating steps of identifying the video data toswitch the driving frequency;

FIG. 54 is a view illustrating an example of a lookup table in which avideo data standard and a driving frequency are associated with eachother;

FIG. 55A is a view illustrating an example of a configuration in which asignal processor module is shared; and

FIG. 55B is a view illustrating another example of a configuration inwhich the signal processor module is shared.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Underlying Knowledgeof Present Disclosure

In the conventional image coding scheme, the pixel information on theadjacent block is used to generate the prediction image in the intraprediction, and the information on the parallel translation is used inthe inter prediction. For this reason, in the intra prediction, even ifa similar region exists in the identical image, the information on theregion cannot be used in the case that the region is not located insurroundings of the target block. In the inter prediction, even if theimages are similar to each other by performing a deformation such asscaling and rotation, the information on the similar image cannot beused.

Additionally, a technique of using high-order motion information such asan affine transform is discussed in the inter prediction. Therefore, thescaling or the rotation deformation of a subject can be expressed byapplying a geometric transform to the motion information. Therefore,quality of the generated prediction image is improved. The codingefficiency is improved by increasing a prediction unit.

However, in the affine transform, at least six-dimensional informationis required because three kinds of deformations of the scaling, therotation, and shear are required to be expressed in addition to theparallel translation. At least eight-dimensional information is requiredto express a projection transform. Thus, in the case that the high-ordermotion information is used, there is generated a problem in that acalculation quantity necessary for a motion information estimationprocess increases.

In exemplary embodiments, the problem is solved by applying a technologyused in computer vision.

Nowadays, various feature point and feature quantity extractiontechniques typified by SIFT (Scale-Invariant Feature Transform) and ORB(Oriented FAST and Rotated BREIF) are proposed with the progress of thetechnology concerning the computer vision. In the feature point andfeature quantity extraction techniques, a feature point is extractedwith respect to an edge or corner portion in the high-reliability image,and information called a feature quantity is generated using a size ordistribution of pixel information and gradient information onsurroundings of the extracted feature point. Hereinafter, the featurequantity is also referred to as a local feature quantity.

The feature quantity extraction techniques frequently have a featurethat the technique is robust to the scaling and the rotation. Therefore,the feature point information includes parameters called a rotationquantity (rotation angle) and a scale value. The feature quantities atthe feature points are compared to each other, and a relationship calleda corresponding point is set for a small Euclid distance between thefeature quantities. A feature point matching process of searching thecorresponding point from a feature point group is also used in a processof generating a panoramic image. In the prediction image generationtechnique, the prediction image can be generated with higher accuracythan the existing technique by applying the technology concerning thefeature quantities.

One aspect of the present disclosure provides a prediction imagegeneration method for generating a prediction image of a target block,the prediction image generation method including: an extraction step ofextracting a plurality of first feature points each of which has a localfeature quantity, the plurality of first feature points being includedin a reconstructed image; a search step of searching a correspondingpoint from the plurality of first feature points, the correspondingpoint having a local feature quantity similar to a local featurequantity of a second feature point corresponding to the target block, arelationship between the corresponding point and the second featurepoint being expressed by information including a non-paralleltranslation component; and a generation step of generating theprediction image from the reconstructed image based on the relationship.

Therefore, in the prediction image generation method, the predictionimage can be generated using the image of a reference region to which adeformation including the non-parallel translation component such asscaling and rotation is added using the feature point and the featurequantity. Therefore, the prediction image generation method can improvethe coding efficiency.

For example, the second feature point may be included in the targetblock, and the prediction image may be generated using a pixel value ofa region including the corresponding point in the reconstructed image inthe generation step.

For example, the second feature point may be a feature point insurroundings of the target block, and the prediction image may begenerated using a pixel value of a region that does not include thecorresponding point in the reconstructed image in the generation step.

For example, the reconstructed image may be a reconstructed image of atarget picture including the target block.

For example, the reconstructed image may be a reconstructed image of apicture different from a target picture including the target block.

Another aspect of the present disclosure provides an image coding methodin which the prediction image generation method is performed, the imagecoding method including an image coding step of coding the target blockusing the prediction image.

Therefore, in the image coding method, a reference region to which adeformation including the non-parallel translation component is addedusing the feature point and the feature quantity, and the predictionimage can be generated using the image of the reference region.Therefore, the prediction image generation method can improve the codingefficiency.

For example, the image coding method may further include a feature pointinformation coding step of coding feature point information identifyingthe second feature point in a plurality of third feature pointscorresponding to the target block. At this point, the plurality of thirdfeature points may be extracted in the extraction step, and the secondfeature point may be selected from the plurality of third feature pointsin the search step.

For example, the feature point information may indicate a coordinate ofthe second feature point.

For example, the feature point information may indicate a rotationquantity or a scale value which is possessed by the second featurepoint.

For example, the image coding method may further include a correspondingpoint information coding step of coding corresponding point informationidentifying the corresponding point in the plurality of first featurepoints.

For example, the corresponding point information may indicate acoordinate of the second feature point.

For example, in the feature point information coding step, indexes maybe allocated to the plurality of first feature points in a predeterminedsequence, and the corresponding point information may indicate the indexallocated to the corresponding point.

For example, in the generation step, an initial value of a motionestimation process may be set based on the relationship, and theprediction image may be generated by performing the motion estimationprocess using the initial value.

Still another aspect of the present disclosure provides an imagedecoding method in which the prediction image generation method isperformed, the image decoding method including an image decoding step ofdecoding the target block using the prediction image.

Therefore, in the image coding method, a reference region to which adeformation including the non-parallel translation component is addedusing the feature point and the feature quantity, and the predictionimage can be generated using the image of the reference region.Therefore, the prediction image generation method can improve the codingefficiency.

For example, the image decoding method may further include a featurepoint information decoding step of decoding feature point informationidentifying the second feature point in a plurality of third featurepoints corresponding to the target block. At this point, the pluralityof third feature points may be extracted in the extraction step, and thesecond feature point may be selected from the plurality of third featurepoints using the feature point information in the search step.

For example, the feature point information may indicate a coordinate ofthe second feature point.

For example, the feature point information may indicate a rotationquantity or a scale value which is possessed by the second featurepoint.

For example, the image decoding method may further include acorresponding point information decoding step of decoding correspondingpoint information identifying the corresponding point in the pluralityof first feature points. At this point, the corresponding point may besearched from the plurality of first feature points using thecorresponding point information in the search step.

For example, in the feature point information decoding step, indexes maybe allocated to the plurality of first feature points in a predeterminedsequence, and the corresponding point information may indicate the indexallocated to the corresponding point.

Yet another aspect of the present disclosure provides a prediction imagegeneration apparatus that generates a prediction image of a targetblock, the prediction image generation apparatus including: anextraction unit that extracts a plurality of first feature points eachof which has a local feature quantity, the plurality of first featurepoints being included in a reconstructed image; a search unit thatsearches a corresponding point from the plurality of first featurepoints, the corresponding point having a local feature quantity similarto a local feature quantity of a second feature point corresponding tothe target block, a relationship between the corresponding point and thesecond feature point being expressed by information including anon-parallel translation component; and a generation unit that generatesthe prediction image from the reconstructed image based on therelationship.

Therefore, in the prediction image generation apparatus, a referenceregion to which a deformation including the non-parallel translationcomponent is added using the feature point and the feature quantity, andthe prediction image can be generated using the image of the referenceregion. Therefore, the prediction image generation apparatus can improvethe coding efficiency.

Note that these general or specific aspects may be implemented using asystem, a method, an integrated circuit, a computer program, or acomputer-readable recording medium such as a CD-ROM, or using any givencombination of a system, a method, an integrated circuit, a computerprogram, and a computer-readable recording medium.

Exemplary embodiments will be described in detail below with referenceto the drawings as needed. However, the detailed description of thewell-known item and the overlapping description of the substantiallyidentical configuration are occasionally neglected. This is becauseunnecessary redundancy of the following description is avoided tofacilitate easy understanding of those skilled in the art.

Note that each of the following exemplary embodiments illustrates aspecific example of the present disclosure. Numerical values, shapes,materials, elements, arranged positions and connection forms of theelements, steps, the order of the steps, and the like of the followingexemplary embodiments are described only by way of example, but do notrestrict the present disclosure. Also, among elements described in thefollowing exemplary embodiments, elements that are not included in anindependent claim which represents the highest-order concept aredescribed as optional elements.

First Exemplary Embodiment

An image coding apparatus in which an image coding method according to afirst exemplary embodiment is used will be described. The image codingapparatus of the first exemplary embodiment performs an intra predictionprocess and an inter prediction process using a local feature quantity.Therefore, the image coding apparatus generates a prediction image usinga reference block expressed by information including a non-paralleltranslation component except a parallel translation component, so thatthe coding efficiency can be improved.

FIG. 1 is a block diagram illustrating an example of image codingapparatus 100 of the first exemplary embodiment. Image coding apparatus100 includes feature quantity extraction unit 101, block division unit102, subtractor 103, frequency transform unit 104, quantization unit105, entropy coding unit 106, inverse quantization unit 107, inversefrequency transform unit 108, adder 109, feature quantity extractionunit 110, intra prediction unit 111, loop filter 112, feature quantityextraction unit 113, frame memory 114, inter prediction unit 115, andswitching unit 116.

Image coding apparatus 100 codes input image 121 to generate bitstream126.

FIG. 2 is a flowchart of an image coding process performed by imagecoding apparatus 100 of the first exemplary embodiment.

Feature quantity extraction unit 101 extracts a feature point includedin input image 121 that is of a still or video image including at leastone picture and a feature quantity of the feature point by a featurepoint and feature quantity extraction technique typified by the SIFT(S101).

Image coding apparatus 100 divides input image 121 into coding blocks122 that are of coding process units (S102).

For each coding block 122, intra prediction unit 111 or inter predictionunit 115 generates prediction block 134 using decoded block 129 ordecoded image 131 (S103). The detailed process is described later.

Subtractor 103 generates difference block 123 that is of a differencebetween coding block 122 and prediction block 134 (S104). Frequencytransform unit 104 performs a frequency transform on difference block123 to generate coefficient block 124. Quantization unit 105 quantizescoefficient block 124 to generate coefficient block 125 (S105).

Entropy coding unit 106 performs entropy coding on coefficient block 125to generate bitstream 126 (S106).

In order to generate decoded block 129 and decoded image 131 to be usedin the generation of prediction block 134 of the subsequent block orpicture, inverse quantization unit 107 performs inverse quantization oncoefficient block 125 to generate coefficient block 127. Inversefrequency transform unit 108 performs inverse frequency transform oncoefficient block 127 to restore difference block 128 (S107).

Adder 109 adds prediction block 134 used in Step S103 and differenceblock 128 to generate decoded block 129 (reconstructed image) (S108).Decoded block 129 is used in an intra prediction process performed byintra prediction unit 111.

By a technique similar to that in Step S101, feature quantity extractionunit 110 extracts the feature point included in decoded block 129 thatis of a decoding result of an already-coded region in input image 121 atthat time and the feature quantity of the feature point (S109). Theextracted feature point and feature quantity are used in the intraprediction.

Image coding apparatus 100 determines whether the coding process of thewhole of one image is completed (S110). When the coding process of thewhole of one image is completed (Yes in S110), loop filter 112 performsa filtering process such as a deblocking filter on a plurality ofdecoded blocks 129 included in the one image in order to relieve imagequality degradation caused by a block deformation, and generates decodedimage 131 (S111). Feature quantity extraction unit 113 extracts thefeature point of decoded image 131 and the feature quantity of thefeature point using the technique similar to that in Steps S101 and S109(S112). The extracted feature point and feature quantity are used in theinter prediction.

Frame memory 114 stores decoded image 131 therein. Decoded image 131 isused in an inter prediction process performed by inter prediction unit115.

Image coding apparatus 100 repeatedly performs a series of processesuntil the coding process to whole input image 121 that is of an inputvideo image is completed (S113).

The inverse quantization and inverse frequency transform in Step S105may sequentially be performed as a separate process or collectively beperformed. Similarly the inverse quantization and inverse frequencytransform in Step S107 may sequentially be performed as a separateprocess or collectively be performed.

The quantization refers to a process in which values sampled atpredetermined intervals are digitized while associated with respectivepredetermined levels. The inverse quantization refers to a process inwhich each value obtained through quantization is returned to a value inthe original interval. In the data compression field, the quantizationrefers to a process in which values are classified into coarserintervals than the original ones, whereas the inverse quantizationrefers to a process in which values in coarser intervals arere-classified into the original finer intervals. In the codec technologyfield, the quantization and the inverse quantization are sometimescalled rounding or scaling.

The prediction block generation process in Step S103 will be describedin detail below with reference to a flowchart in FIG. 3.

Intra prediction unit 111 generates prediction block 130 through theintra prediction process (S121). Inter prediction unit 115 generatesprediction block 132 through the inter prediction process (S122).Switching unit 116 performs a cost calculation on each of predictionblocks 130 and 132 obtained in Steps S121 and S122 using an R-Doptimization model (the following (Eq. 1)), selects a technique lesscostly, namely, a technique having higher coding efficiency, and outputsthe prediction block corresponding to the selected technique asprediction block 134 (S123).

[Mathematical formula 1]

Cost=D+λ×1β  (Eq. 1)

In (Eq. 1), D denotes coding distortion and is, for example, a sum ofabsolute values of differences between an original pixel value of apixel in a coding-target block and a value of a corresponding pixel inthe generated prediction image. R denotes a generated code quantity andis, for example, a code quantity necessary for the coding of motioninformation used to generate the prediction block. λ denotes theLagrange multiplier. Therefore, an appropriate prediction mode isselected from among intra prediction and inter prediction, so that thecoding efficiency can be improved.

In advance of the processes in Steps S121 and S122, image codingapparatus 100 may decide which one of the prediction processes is used,and perform only the process corresponding to the decided predictionprocess. Therefore, a processing quantity for the prediction imagegeneration process can be reduced.

Image coding apparatus 100 performs coding on information indicatingwhich one of the intra prediction and the inter prediction is used. Asused herein, the coding means that the information is provided to thecoding information, in other words, a coded bitstream including theinformation is generated.

The intra prediction process in Step S121 will be described in detailbelow with reference to a flowchart in FIG. 4.

Intra prediction unit 111 generates the prediction block by performingthe intra prediction process adopted in H.264/AVC and H.265/HEVC inwhich information on the adjacent already-coded block is used (S141).Hereinafter, the process is referred to as usual intra prediction.

Intra prediction unit 111 generates the prediction block through theintra prediction process in which a corresponding point of a featurepoint is used (S142). Hereinafter, the process is referred to as featurepoint use intra prediction.

Intra prediction unit 111 performs the process similar to that in StepS123 on the prediction blocks generated in Steps S141 and S142, selectsthe technique having the higher coding efficiency in the techniques ofSteps S141 and S142, and outputs the prediction block corresponding tothe selected technique as prediction block 130 (S143).

In advance of the processes in Steps S141 and S142, intra predictionunit 111 may decide which one of the prediction processes is used, andperform only the process corresponding to the decided predictionprocess. Therefore, a processing quantity for the prediction imagegeneration process can be reduced.

Image coding apparatus 100 performs the coding on information indicatingwhich one of the usual intra prediction process and the feature pointuse intra prediction process is used.

The feature point use intra prediction process in Step S142 will bedescribed in detail below with reference to a flowchart in FIG. 5.

Intra prediction unit 111 generates the prediction block by performingthe intra prediction process in which the feature point included in atarget block that is of a target in the prediction process is used(S161). Hereinafter, the prediction process is referred to as anin-block mode.

Intra prediction unit 111 performs the intra prediction process in whichthe feature point existing in the already-coded region in surroundingsof the target block is used (S162). Hereinafter, the prediction processis referred to as a surrounding block mode.

Intra prediction unit 111 performs the process similar to that in StepS123 on each of the prediction blocks generated in Steps S161 and S162,and selects the technique having the higher coding efficiency in thein-block mode and the surrounding block mode (S163).

In advance of the processes in Steps S161 and S162, intra predictionunit 111 may decide which one of the prediction modes is used, andperform only the process corresponding to the decided predictionprocess. Therefore, a processing quantity for the prediction imagegeneration process can be reduced.

The in-block mode intra prediction process in Step S161 will bedescribed in detail below with reference to a flowchart in FIG. 6.

Intra prediction unit 111 extracts the feature point existing in thetarget block from the feature point of the input image 121, the featurepoint of the input image 121 being obtained in Step S101 (S181). Whenthe feature point exists (Yes in S182), intra prediction unit 111performs a corresponding point search process (S183). Specifically,intra prediction unit 111 searches the corresponding point that is ofthe feature point having the feature quantity similar to the featurequantity of the feature point extracted in Step S181 from the featurepoints extracted from the decoded block of the already-coded region inthe same image as the target block, the feature point being obtained inStep S109. As used herein, for example, the similarity means a smallEuclid distance between the feature quantities.

When the corresponding point exists in Step S183 (Yes in S184), intraprediction unit 111 performs the prediction image generation processusing the information between the corresponding points (S185).

FIG. 7 is a view illustrating the prediction image generation process,and illustrating target image 150 including target block 153 that is ofa prediction process target. Target image 150 is already coded, andincludes already-coded region 151 where decoded block 129 is generatedand uncoded region 152 where the coding is not performed.

Intra prediction unit 111 sets a region including target block 153around feature point 154 extracted in Step S181 to prediction targetregion 155. Intra prediction unit 111 also sets a region aroundcorresponding point 156 corresponding to feature point 154 extracted inStep S183 to reference region 157 used in the prediction imagegeneration. Then, intra prediction unit 111 generates the predictionimage (prediction block) using pixel information on reference block 158,which is included in reference region 157 and is the image located at aposition corresponding to a position of target block 153 in predictiontarget region 155.

At this point, local feature quantity extraction techniques, such asSIFT and ORB, which are used to decide reference region 157 as thefeature point extraction technique, are robust to the scaling and therotation. Therefore, as illustrated in FIG. 7, intra prediction unit 111can generate the prediction image in consideration of the rotation andenlargement of prediction target region 155.

Specifically, each feature point has information on the rotationquantity and scale value in addition to coordinate information. Intraprediction unit 111 transforms reference block 158 based on a differencebetween the rotation quantity and scale value of feature point 154 andthe rotation quantity and scale value of corresponding point 156 togenerate the prediction image (prediction block).

In the case that reference region 157 includes an outside of targetimage 150, intra prediction unit 111 may perform a padding process inwhich a region having no piece of pixel information is generated bycopying the pixel information near a terminal end of the image.Therefore, reference region 157 can be set by a simple process, andquality degradation of the prediction image can be suppressed.

Instead of the padding process, intra prediction unit 111 may perform afolding process in which the pixel information outside a screen isgenerated by folding the pixel information inside the image with theterminal end as an axis. Alternatively, intra prediction unit 111 maypaint pixels outside the image with a predetermined value. In theformer, quality of the prediction image can be improved because detailedinformation on the image can be used. In the latter, an increase inprocessing amount can be suppressed because of the simple process ofsubstituting a fixed value.

Intra prediction unit 111 provides feature-point-related information tocoding information (S186).

When the feature point does not exist in the target block (No in S182),or when the corresponding point does not exist (No in S184), intraprediction unit 111 performs an exception process in the case that thecorresponding point does not exist (S187). Specifically, intraprediction unit 111 sets a cost value of the exception process to themaximum value. For example, intra prediction unit 111 sets D and R in(Eq. 1) to infinite. Therefore, the technique in Step S161 is notselected in Step S163.

The method for providing the feature-point-related information to thecoding information in Step S186 will be described in detail below withreference to a flowchart in FIG. 8.

At this point, the feature-point-related information includescorresponding point information that is of information on thecorresponding point and feature point information that is of informationon the feature point in the target block.

Intra prediction unit 111 provides corresponding point information thatis of the information necessary to search the corresponding point to thecoding information (S201). Specifically, target point informationindicates a coordinate of the corresponding point corresponding to thefeature point included in the target block.

Intra prediction unit 111 determines whether feature point informationthat is of the information on the feature point included in the targetblock is matched with a defined value (S202). As used therein, thefeature point information means the coordinate of the feature point inthe block, the rotation quantity and scale value possessed by thefeature point, and the like. For example, the defined value of thecoordinate is a coordinate in the center of the target block, therotation quantity has the defined value of 0 degree, and the scale valuehas the defined value of 1.

When all the pieces of feature point information included in the targetblock are matched with the defined values (Yes in S202), intraprediction unit 111 turns off a detailed information flag indicatingthat the detailed information on the feature point exists, and intraprediction unit 111 provides the detailed information flag to the codinginformation (S203).

When at least one of the pieces of feature point information included inthe target block is not matched with the defined value (No in S202),intra prediction unit 111 turns on the detailed information flag, andintra prediction unit 111 provides the detailed information flag to thecoding information (S204). Intra prediction unit 111 provides pieces ofdetailed information such as the coordinate and rotation quantity of thefeature point included in the target block to the coding information(S205).

The corresponding point information in Step S201 is not limited to theabove information. The corresponding point information may be an indexthat can uniquely decide the corresponding point from the feature pointgroup (feature points extracted from the whole already-coded region)including the corresponding point. For example, the index is set suchthat the value of the feature point decreases as the feature point comesclose to the target block. Alternatively, the index is set such that thevalue of the feature point decreases with increasing reliabilitycorresponding to the feature quantity. Although two-dimensionalinformation is required for the use of the coordinate, the correspondingpoint can be indicated by one-dimensional information for the use of theindex.

The defined value in Step S202 is not limited to the above value. Forexample, the defined value of the coordinate may be a coordinate at anupper left end of the target block. The defined value of the rotationquantity may be a rotation quantity in the whole image.

The detailed information may indicate an absolute value or relativevalue of the coordinate, rotation quantity, or scale value. That is, thedetailed information may indicate a difference between the defined valueof the coordinate, rotation quantity, or scale value and the coordinate,rotation quantity, or scale value of the feature point used.Alternatively, the detailed information may indicate a difference incoordinate, rotation quantity, or scale value between the correspondingpoint and the feature point. The defined value in Step S202 may be adefined value of the above difference.

Although the process of determining whether all the elements such as thecoordinate, the rotation quantity, and the scale value are matched withthe defined values is described as the determination process in StepS202, intra prediction unit 111 may perform the determination of eachelement, and code a flag of each element. Alternatively, intraprediction unit 111 may group the plurality of elements into somegroups, perform the determination in each group, and code a flag of eachgroup. For at least two flags, intra prediction unit 111 performs theprocesses in Steps S202 to S205 times for the number of flags. Forexample, intra prediction unit 111 may code the flag for the coordinateinformation and the flag for the information on the rotation quantityand scale value. Therefore, the setting is flexibly performed during thecoding, so that the coding efficiency can be improved.

The surrounding block mode intra prediction process in Step S162 will bedescribed in detail below with reference to a flowchart in FIG. 9.

Intra prediction unit 111 extracts the feature point included in anadjacent region, which is of the decoded image of the already-codedregion adjacent to the target block, from the feature point obtained inStep S109 (S221). When at least one feature point can be extracted inStep S221 (Yes in S222), intra prediction unit 111 extracts the featurepoint extracted from the decoded image of the already-coded region inthe same image as the target block in the feature points obtained inStep S109 with respect to extracted at least the one feature point, andintra prediction unit 111 searches the corresponding point in theextracted feature points (S223).

When at least one corresponding point exists in Step S223 (Yes in S224),intra prediction unit 111 generates the prediction image from thecorresponding point information (S225).

FIG. 10 is a view illustrating this process, and illustrating targetimage 150 including target block 153 that is of the prediction processtarget. Target image 150 is already coded, and includes already-codedregion 151 where decoded block 129 is generated and uncoded region 152where the coding is not performed.

Intra prediction unit 111 selects at least one feature point 164 fromthe feature point group, which is obtained in Step S221 and exists inthe surrounding region of the target block, and sets prediction targetregion 165 including both feature point 164 and target block 153. Intraprediction unit 111 sets the region including corresponding point 166 offeature point 164 included in prediction target region 165 to referenceregion 167. In setting reference region 167, intra prediction unit 111performs geometric transforms such as an affine transform using theplurality of corresponding points 166. Therefore, the coding efficiencyis improved because the quality of the prediction image is improved.Then, intra prediction unit 111 generates the prediction image(prediction block) using pixel information on reference block 168, whichis included in reference region 167 and is the image located at theposition corresponding to the position of target block 153 in predictiontarget region 165.

Intra prediction unit 111 needs not to generate the prediction imageusing all the feature point at which the corresponding points are found,but may change a combination of the corresponding points used and decidea combination of the corresponding points having the best codingefficiency. Therefore, the coding efficiency is further improved. In thecase that reference region 167 includes the outside of target image 150,intra prediction unit 111 may perform an out-of-region process such asthe padding process described in Step S185. Therefore, the codingefficiency is improved.

Then intra prediction unit 111 provides the feature-point-relatedinformation, which is of the information on the feature point necessaryfor the process in Step S225, to the coding information (S226).

When the feature point does not exist in the region adjacent to thetarget block (No in S222), or when the corresponding point does notexist (No in S224), intra prediction unit 111 performs the exceptionprocess in the case that the corresponding point does not exist (S227).The process in Step S227 is similar to the process in Step S187, and isperformed such that the technique in Step S162 is not selected.

The process of providing the feature-point-related information to thecoding information in Step S226 will be described in detail below withreference to a flowchart in FIG. 11.

At this point, the feature-point-related information includes extractionrange information indicating extraction region 175 including featurepoints 174 in the surroundings of target block 153, number informationindicating the number of feature points used in feature points 174extracted in the extraction region 175, and feature point designatinginformation designating the feature point used in feature points 174extracted in extraction region 175.

Intra prediction unit 111 provides the extraction range informationnecessary to designate extraction region 175 that is of a rectangularregion including all feature points 174 in the surroundings of targetblock 153 used in Step S225 to the coding information (S241). Forexample, the extraction range information is information defining awidth and a height of extraction region 175, and the extraction rangeinformation includes information indicating width 171 and height 172 inFIG. 12.

Intra prediction unit 111 provides the number information indicating thenumber of feature points (corresponding points) used in Step S225 to thecoding information (S242). In other words, the number informationindicates the number of feature points (corresponding points) includedin a set of feature points used.

When all feature points 174 extracted in extraction region 175designated by the information in Step S241 are used (Yes in S243), intraprediction unit 111 turns off the detailed information flag indicatingthat the detailed information on the feature point exists, and intraprediction unit 111 provides the detailed information flag to the codinginformation (S244).

On the other hand, when all the feature points extracted in extractionregion 175 are used (No in S243), intra prediction unit 111 turns on thedetailed information flag, and intra prediction unit 111 provides thedetailed information flag to the coding information (S245). Intraprediction unit 111 provides the pieces of feature point designatinginformation, which designates the feature point used in feature points174 extracted in extraction region 175, for the number of feature pointsto the coding information (S246). For example, the feature pointdesignating information is information indicating the coordinate of thefeature point.

The coding process in Steps S241, S242, and Steps S243 to S246 is notlimited to the above order, but may properly be changed.

The width and height of extraction region 175 in Step S241 may bedesignated in units of coding blocks, images, or sequences. A fixedvalue may be used as the width and height of extraction region 175. Inthe case that the width and height are changed in units of blocks, therange is flexibly set, so that the quality of the prediction image canbe improved. In the case that the width and height are designated inunits of images, the coding efficiency is improved because the codinginformation decreases.

Different values may be set to the width and height of extraction region175, or an identical value may be set to the width and the height. Inthe case that the different values are set to the width and height, theeffective feature point is easily selected, so that the quality of theprediction image can be improved. On the other hand, in the case thatthe identical value is set to the width and the height, a code quantitycan be reduced because of only one piece of information necessary forthe coding.

The extraction range information in Step S241 may indicate the number ofpixels corresponding to the width and height, or indicate the number ofalready-coded blocks corresponding to the width and height. In thiscase, the number of blocks in a width direction and the number of blocksin a height direction may separately be designated, or an identicalvalue may be used as the number of blocks in the width direction and thenumber of blocks in the height direction. The code quantity can bereduced because the use of the number of blocks can express distanceinformation such as the width by a smaller value.

The inter prediction process in Step S122 performed by inter predictionunit 115 will be described in detail below with reference to a flowchartin FIG. 13.

Inter prediction unit 115 generates the prediction image by performingthe inter prediction process using motion information used in H.264/AVCor H.265/HEVC that is of the existing video coding scheme (S261).Hereinafter, the process is referred to as usual inter prediction.

Inter prediction unit 115 generates the prediction image by performingthe inter prediction process in which the feature point is used (S262).Hereinafter, the process is referred to as feature point use interprediction.

Intra prediction unit 115 performs the process similar to that in StepS123 on each of the prediction images obtained in Steps S261 and S262,thereby selecting the technique having the higher coding efficiency(S263).

The feature point use inter prediction process in Step S262 will bedescribed in detail below with reference to a flowchart in FIG. 14.

Inter prediction unit 115 generates the prediction image by performingthe inter prediction process in which the feature point included in thetarget block is used (S281). Hereinafter, the prediction process isreferred to as an in-block mode.

Inter prediction unit 115 performs the inter prediction process in whichthe feature point existing in the already-coded region in thesurroundings of the target block is used (S282). Hereinafter, theprediction process is referred to as a surrounding block mode.

Inter prediction unit 115 performs the process similar to that in StepS123 on each of the prediction images generated in Steps S281 and S282,thereby selecting the technique having the higher coding efficiency(S283).

The in-block mode inter prediction process in Step S281 will bedescribed in detail below with reference to a flowchart in FIG. 15.

Because the processes in Steps S301, S302, S304, and S306 are similar tothose in Steps S181, S182, S184, and S186, the detailed description isneglected.

When the feature point exists in the target block (Yes in S302), interprediction unit 115 searches the corresponding point of the featurepoint in the target block extracted in Step S301 from the feature pointsthat are extracted from at least one reference image in Step S112(S303).

When the corresponding point exists in Step S303 (Yes in S304), interprediction unit 115 generates the prediction image using a resultobtained in Step S303 (S305).

FIG. 16 is a view illustrating this process, and illustrating targetimage 150 including target block 153 that is of the prediction processtarget and reference image 180 that is not target image 150 but thealready-coded picture. Target image 150 is already coded, and includesalready-coded region 151 where decoded block 129 is generated anduncoded region 152 where the coding is not performed.

Inter prediction unit 115 sets prediction target region 155 by thetechnique similar to the technique of setting prediction target region155 in Step S185. Inter prediction unit 115 sets reference region 181around corresponding point 182 obtained in Step S303 by the techniquesimilar to the technique of setting reference region 157 in Step S185.Inter prediction unit 115 generates the prediction image from referenceregion 181 by the technique similar to that in Step S185. That is, interprediction unit 115 generates the prediction image (prediction block)using the pixel information on reference block 183, which is included inreference region 181 and is the image located at the positioncorresponding to the position of target block 153 in prediction targetregion 155.

In Step S307, inter prediction unit 115 performs the processes similarto those in Steps S187 and S227 such that the technique in Step S281 isnot selected in Step S283.

The surrounding block mode inter prediction process in Step S282 will bedescribed in detail below with reference to a flowchart in FIG. 17.

Because the processes in Steps S321, S322, S324, and S326 are similar tothose in Steps S221, S222, S224, and S226, the detailed description isneglected.

When the feature point exists in the surrounding already-coded block inStep S322 (Yes in S322), inter prediction unit 115 searches thecorresponding point in the feature points that are extracted from atleast one reference image in Step S112 with respect to each of at leastone feature point extracted in Step S321 (S323). When the correspondingpoint exists in Step S323 (Yes in S324), inter prediction unit 115generates the prediction image using a result of the corresponding pointobtained in Step S323 (S325).

FIG. 18 is a view illustrating this process, and illustrating targetimage 150 including target block 153 that is of the prediction processtarget and reference image 180 that is not target image 150 but thealready-coded picture. Target image 150 is already coded, and includesalready-coded region 151 where decoded block 129 is generated anduncoded region 152 where the coding is not performed.

Inter prediction unit 115 sets prediction target region 165 by thetechnique similar to the technique of setting prediction target region165 in Step S225. Then, inter prediction unit 115 sets reference region191 by the technique similar to the technique of setting referenceregion 167 in Step S225 using corresponding point 192 obtained in StepS323. Specifically, inter prediction unit 115 sets the region includingcorresponding point 192 of feature point 164 included in predictiontarget region 165 to reference region 191.

Then, inter prediction unit 115 generates the prediction image fromreference region 191 by the technique similar to that in Step S225.Specifically, inter prediction unit 115 generates the prediction image(prediction block) using the pixel information on reference block 193,which is included in reference region 191 and is the image located atthe position corresponding to the position of target block 153 inprediction target region 165.

In Step S327, inter prediction unit 115 performs the processes similarto those in Steps S187, S227, and S307 such that the technique in StepS282 is not selected in Step S283.

As described in Step S201, the information necessary to search thecorresponding point in Step S321 may be an index that can uniquelydecide the corresponding point from the feature point group (featurepoints extracted from the whole already-coded region) including thecorresponding point.

Advantageous Effect

As described above, according to the first exemplary embodiment, theprediction image pursuant to higher-order motion information such as therotation and the enlargement can be generated with an informationquantity less than that of the existing technique in the intraprediction and the inter prediction by applying the technologyconcerning the local feature quantity.

In the first exemplary embodiment, by way of example, SIFT is describedas the feature quantity extraction technique used in Steps S101, S109,and S112. However, the feature quantity and the feature quantityextraction technique are not limited to SIFT. For example, ORB or SURF(Speeded Up Robust Features) that is of another technology concerningthe local feature quantity may be used.

The feature point extraction technique may be different from the featurequantity calculation technique. Specifically, each technique has adifferent characteristic such that the processing quantity is small, orsuch that the technique is robust to the affine transform in addition tothe scaling. Therefore, the coding efficiency can be improved bychanging the local feature quantity and the local feature quantityextraction technique used according to a kind of the video image to becoded.

In Steps S202 to S205, the rotation quantity and scale value of thefeature point are cited as an example of the detailed information on thefeature point in addition to the coordinate. However, the informationthat can be used as the detailed information is not limited to therotation quantity and scale value and the coordinate.

It is not always necessary to code the above pieces of information. Forexample, FAST that is a technique of extracting only the position of thefeature point is not robust to the scaling and the rotation. Therefore,it is not necessary for image coding apparatus 100 to code the pieces ofinformation. Thus, image coding apparatus 100 can code the informationnecessary to generate the prediction image by changing the informationthat is coded according to the feature quantity used.

Not one feature point but the plurality of feature points may beextracted in Steps S181 and S301. In this case, image coding apparatus100 may generate the prediction image using the pieces of information onthe plurality of feature points. Therefore, the coding efficiency isimproved because the quality of the generated prediction image isimproved. In Steps S161 and S281, image coding apparatus 100 searchescorresponding point of each feature point (S183 and S303), generates theprediction image using the relationship between the plurality of featurepoints and the plurality of corresponding points (S185 and S305), andprovides the information on each feature point to the coding information(S186 and S306), which allow this process to be performed. For example,image coding apparatus 100 can search the plurality of correspondingpoints corresponding to the plurality of feature points through theprocess similar to that in FIGS. 10 and 18, and decide the referenceregion and reference block including the plurality of correspondingpoints. The prediction image generation process in which therelationship between the plurality of feature points and the pluralityof corresponding points is used is not limited to the above technique.

In the first exemplary embodiment, by way of example, the abovetechnique is used in both the inter prediction and the intra prediction.Alternatively, the above technique may be used only in one of the interprediction and the intra prediction. In the first exemplary embodiment,both the prediction process (inter prediction or intra prediction) inwhich the feature point in the target block is used and the predictionprocess (inter prediction or intra prediction) in which the featurepoint in the surroundings of the target block is used are described byway of example. Alternatively, only one of the prediction process (interprediction or intra prediction) in which the feature point in the targetblock is used and the prediction process (inter prediction or intraprediction) in which the feature point in the surroundings of the targetblock is used may be used.

In the first exemplary embodiment, the corresponding point is selectedin the feature points included in the reconstructed image (decoded block129 or decoded image 131). Alternatively, the corresponding point may beselected in the feature points included in input image 121. For example,image coding apparatus 100 may search the corresponding point in thefeature points included in input image 121 in the case that thecorresponding point does not exist in the reconstructed image.

Second Exemplary Embodiment

Modifications of image coding apparatus 100 and the image coding methodof the first exemplary embodiment will be described in a secondexemplary embodiment.

Because the processes except the usual inter prediction process in whichthe motion information in Step S261 is used are similar to those of thefirst exemplary embodiment, the description except that in Step S261 isneglected.

The detailed inter prediction process in which the motion information isused in Step S261 will be described with reference to a flowchart inFIG. 19.

Inter prediction unit 115 performs a motion information estimationprocess (S341). Inter prediction unit 115 performs a motion compensationprocess using the motion information obtained in Step S341 (S342). Then,inter prediction unit 115 generates a difference motion information thatis of a difference between prediction motion information that is of themotion information on the already-coded temporal or spatial adjacentblock and the motion information decided in Step S342 (S343).

The motion information prediction process in Step S341 will be describedin detail with reference to a flowchart in FIG. 20.

Inter prediction unit 115 extracts the feature point included in thetarget block in the feature points extracted from the input image inStep S101 (S361). Inter prediction unit 115 searches the correspondingpoint corresponding to the feature point extracted in Step S361 from theset of feature points that are extracted in Step S101 at the time whenthe reference image used in the inter prediction is the input image(S362). The detailed process is similar to the search process describedin the first exemplary embodiment except that the feature point used isdifferent.

When the corresponding point is successfully searched in Step S362 (Yesin S363), inter prediction unit 115 sets the initial value of the motionestimation process using the corresponding point information (S365). Atthis point, not only the parallel translation component but also theinformation on the rotation quantity and the scaling can be acquiredfrom the relationship of the corresponding point obtained by the localfeature quantity extraction technique such as SIFT and ORB, so that theinitial values concerning the parallel translation component and theinformation on the rotation quantity and the scaling can be set.

On the other hand, when the search of the corresponding point fails inStep S362 (No in S363), inter prediction unit 115 sets a defined valueto the initial value of the motion estimation process (S364). Forexample, the parallel translation component has the defined value of 0,and the scaling parameter has the defined value of 1. Inter predictionunit 115 may roughly estimate low-order information, and use anestimation result as the initial value of the high-order motioninformation estimation process. For example, inter prediction unit 115sets the estimation result of the information on the two-dimensionalparallel translation vector to the initial value in estimating themotion information on the six-dimensional affine transform.

Inter prediction unit 115 performs a motion estimation process using theinitial value set in Step S364 or S365 (S366).

Advantageous Effect

As described above, according to the second exemplary embodiment, theuse of the relationship of the corresponding point of the featurequantity can effectively set the initial value with respect to theprocess of estimating the high-order motion information including theaffine transform. Therefore, speed of the motion information estimationprocess can be enhanced, and the quality of the prediction image inwhich the estimation result is used can be improved.

In the process in Step S362, inter prediction unit 115 may search thecorresponding point on the set of feature points that are extracted fromthe decoded reference image in Step S112. Therefore, the necessity tostore the feature point set information obtained in the input image iseliminated, and a memory quantity can be reduced during the process.

In the second exemplary embodiment, by way of example, the process ofthe second exemplary embodiment is performed in addition to thetechnique of the first exemplary embodiment. Alternatively, only theprocesses concerning the second exemplary embodiment may be performed.

Third Exemplary Embodiment

In a third exemplary embodiment, an image decoding apparatus thatdecodes a bitstream generated by the image coding apparatus 100 will bedescribed.

FIG. 21 is a block diagram illustrating an example of image decodingapparatus 200 of the third exemplary embodiment. Image decodingapparatus 200 includes entropy decoding unit 201, inverse quantizationunit 202, inverse frequency transform unit 203, adder 204, featurequantity extraction unit 205, intra prediction unit 206, loop filter207, frame memory 208, feature quantity extraction unit 209, interprediction unit 210, and switching unit 211.

Image decoding apparatus 200 generates decoded image 227 by performing adecoding process on input bitstream 221. For example, bitstream 221 isgenerated by image coding apparatus 100. The various pieces ofinformation included in bitstream 221 have the same meanings as thefirst exemplary embodiment.

FIG. 22 illustrates a flowchart of an image coding process performed byimage decoding apparatus 200 of the third exemplary embodiment.

Entropy decoding unit 201 decodes prediction information from bitstream221 that is obtained by coding the still or video image including atleast one picture (S401). Entropy decoding unit 201 decodes coefficientblock 222 from bitstream 221 (S402).

Then the inverse quantization unit 202 performs inverse quantization onthe coefficient block 222 to generate a coefficient block 223. Inversefrequency transform unit 203 performs the inverse frequency transform oncoefficient block 223 to restore difference block 224 (S403).

Then, intra prediction unit 206 or inter prediction unit 210 generatesprediction block 230 using the prediction information decoded in StepS401 and decoded block 225 or decoded image 227 (S404). Specifically,intra prediction unit 206 generates prediction block 226 through theintra prediction process. Inter prediction unit 210 generates predictionblock 229 through the inter prediction process. Switching unit 211outputs one of the prediction blocks 226 and 229 as prediction block230.

Adder 204 adds prediction block 224 obtained in Step S403 and predictionblock 230 obtained in Step S404 to generate decoded block 225 (S405).Decoded block 225 is used in the intra prediction process performed byintra prediction unit 206.

Feature quantity extraction unit 205 extracts the feature point and thefeature quantity of the feature point from a set of decoded blocks 225decoded in Steps by S405 (S406). The extracted feature point and featurequantity are used in the intra prediction. The detailed process issimilar to the process in Step S109.

Then, image decoding apparatus 200 determines whether decoded blocks 225for one image are decoded (S407). When the decoding process for oneimage is not completed (No in S407), image decoding apparatus 200performs the processes from Step S401 in order to perform the next blockdecoding process.

On the other hand, when the decoding of decoded block 225 for one imageis completed (Yes in S407), loop filter 207 performs the filteringprocess on the decoded image (S408). Specifically, loop filter 207performs the filtering process such as the deblocking filter on theplurality of decoded blocks 225 included in the one image in order torelieve the image quality degradation caused by the block deformation,and generates decoded image 227.

The frame memory 208 stores the decoded image 227 therein. The decodedimage 227 is used in the inter prediction process performed by the interprediction unit 210.

Feature quantity extraction unit 209 extracts the feature point and thefeature quantity of the feature point from decoded image 227 (S409). Thedetailed process is similar to the process in Step S111.

Image decoding apparatus 200 determines whether all blocks included ininput bitstream 221 are decoded (S410). Specifically, image decodingapparatus 200 determines that all the blocks are decoded when inputbitstream 221 is ended.

When all the blocks are not decoded (No in S410), image decodingapparatus 200 performs the processes from Step S401 in order to performthe next block decoding process. On the other hand, when all the blocksare decoded (Yes in S410), image decoding apparatus 200 ends thedecoding process.

The inverse quantization and inverse frequency transform in Step S403may sequentially be performed as a separate process or collectively beperformed. According to currently dominant coding standards such asHEVC, the inverse quantization and the inverse frequency transform arecollectively performed. Similarly to the first exemplary embodiment,sometimes expressions such as scaling are used on the decoding side inthe inverse quantization process and the inverse frequency transformprocess.

The prediction information decoding process in Step S401 will bedescribed in detail below with reference to a flowchart in FIG. 23.

Image decoding apparatus 200 determines whether the method forpredicting the target block that is of the target of the block to bedecoded is the intra prediction or the inter prediction (S421).

When the prediction technique is determined to be the intra prediction(“INTRA” in S421), image decoding apparatus 200 determine whether theprediction mode in the intra prediction is the usual intra prediction inwhich the pixel information on the already-decoded adjacent block isused or the feature point use intra prediction in which the featurepoint is used (S422).

When the prediction mode is the usual intra prediction (“USUAL” inS422), image decoding apparatus 200 decodes information on an adjacentblock use method from bitstream 221 (S424). As used herein, the adjacentblock use method means information indicating an intra predictiondirection in H.265/HEVC.

On the other hand, when the prediction mode feature point is the featurepoint use intra prediction in which the feature point is used (“FEATUREPOINT USE” in S422), image decoding apparatus 200 decodes thefeature-point-related information that is of the information on thefeature point from bitstream 221 (S425).

When the prediction mode is determined to be the inter prediction inStep S421 (“INTER” in S421), image decoding apparatus 200 determineswhether the prediction mode in the inter prediction is the usual interprediction in which the motion information is used or the feature pointuse inter prediction in which the feature point is used (S423).

When the prediction mode is the usual intra prediction (“USUAL” inS423), image decoding apparatus 200 decodes the motion information frombitstream 221 (S426). As used herein, the motion information means aparallel translation vector and a high-order affine transform matrix,which are used in video coding schemes typified by H.265/HEVC.

On the other hand, when the prediction mode is the feature point useinter prediction (“FEATURE POINT USE” in S423), image decoding apparatus200 decodes the feature-point-related information on the feature pointfrom bitstream 221 (S425).

The feature-point-related information decoding process in Step S425 maybe performed by different processors according to the intra predictionand the inter prediction. Therefore, in the case that the intraprediction process and the inter prediction process are simultaneouslyperformed, because the process in Step S425 can be performed inparallel, the processing speed can be enhanced.

For example, the determination processes in Steps S421 to S423 areperformed based on the pieces of information included in bitstream 221.Specifically, for example, bitstream 221 includes the informationindicating whether the prediction mode is the intra prediction or theinter prediction. Bitstream 221 also includes the information indicatingwhether the prediction mode is the usual intra prediction or the featurepoint use intra prediction and the information indicating whether theprediction mode is the usual inter prediction or the feature point useinter prediction.

The feature-point-related information decoding process in Step S425 willbe described in detail below with reference to a flowchart in FIG. 24.

Image decoding apparatus 200 determines which mode feature-point-relatedinformation is coded (S441). When the in-block modefeature-quantity-related information that is of the information on thefeature point existing in the target block is coded (“IN-BLOCK” inS441), image decoding apparatus 200 decodes the in-block modefeature-quantity-related information from bitstream 221 (S442).

On the other hand, when the surrounding block mode feature-point-relatedinformation that is of the information on the feature point included inthe already-decoded block in the surroundings of the target block iscoded (“SURROUNDING BLOCK” in S441), image decoding apparatus 200decodes the surrounding block mode feature-point-related informationfrom bitstream 221 (S443).

In Step S441, image decoding apparatus 200 performs the determinationprocess by referring to a flag that is included in bitstream 221 toindicate the in-block mode or the surrounding block mode. The flag maybe coded in units of blocks or units of images or video images. In thecase that the flag is coded in units of blocks, the quality of theprediction image is improved because the optimum coding mode can beselected. In the case that the flag is coded in units of images, thecoding quantity is reduced because the number of flags decreases.

In the feature-point-related information decoding process of FIG. 24,two modes are switched by way of example. Alternatively, one of thein-block mode and the surrounding block mode may be always used.

The in-block mode feature-point-related decoding process in Step S442will be described in detail with reference to a flowchart in FIG. 25. Atthis point, the in-block mode feature-point-related information includesthe corresponding point information that is of the information on thecorresponding point and the feature point information that is of theinformation on the feature point in the target block.

Image decoding apparatus 200 decodes the corresponding point informationdeciding the corresponding point of the feature point in the targetblock from bitstream 221 (S461). Specifically, the corresponding pointinformation is the information indicating the coordinate of thecorresponding point.

Image decoding apparatus 200 determines whether the detailed informationflag indicating that bitstream 221 includes the detailed information onthe feature point is turned on (S462). The detailed information flag isincluded in bitstream 221.

When the detailed information flag is turned off (No in S462), imagedecoding apparatus 200 sets the defined value to the detailedinformation (S463).

On the other hand, when the detailed information flag is turned on (Yesin S462), image decoding apparatus 200 decodes the detailed informationfrom bitstream 221 (S464).

As used herein, the detailed information means the informationindicating the coordinate of the feature point included in the targetblock during the coding and the rotation quantity and scale value of thefeature point.

Specifically, the detailed information indicating the coordinate of thefeature point is a two-dimensional vector in x- and y-directionsindicating the position of the feature point from the center of thetarget block. The rotation quantity and the scale value are valuescalculated by local feature quantity extraction techniques such as SIFT.The information on the coordinate of the feature point in the targetblock and the rotation quantity and scale value of the feature point andthe information on the feature point that becomes the correspondingpoint are decoded through these processes.

Relative values between the rotation quantity and scale value of thefeature quantity and the rotation quantity and scale value of thecorresponding point may be coded as the rotation quantity and scalevalue of the feature quantity. Therefore, in the case that the featurepoint is identical to the corresponding point in the rotation quantityor scale value, because the information to be coded becomes 0, thecoding efficiency is improved. A flag indicated which one of thecalculated value and the relative value is used may be coded. Therefore,the coding efficiency is further improved because the optimum codingtechnique can be selected in each block.

For example, the defined value in Step S463 is the center of the targetblock in the coordinate of the feature point, 0 degree in the rotationquantity, and 1 in the scale value.

The defined value is not limited to the above values. For example, arotation angle and a scale factor, which are obtained from global motioninformation, may be used to set the rotation quantity and scale value ofthe feature point, respectively. Therefore, the use of the imagedeformation associated with global motion can reduce the coding quantitycompared with the case that the fixed value is used.

The surrounding block mode feature-point-related decoding process inStep S443 will be described in detail below with reference to aflowchart in FIG. 26.

Image decoding apparatus 200 decodes the extraction range informationthat is of the information necessary to set an extraction region wherethe feature point is extracted (S481). Image decoding apparatus 200decodes the number information indicating the number of feature pointsused in generating the prediction image (S482).

Image decoding apparatus 200 determines whether the detailed informationflag indicating that the detailed information on the feature pointexists is turned on (S483).

When the detailed information flag is turned on (Yes in S483), imagedecoding apparatus 200 decodes the feature point designating informationnecessary to identify the feature point used during the prediction imagegeneration (S484). For example, the feature point designatinginformation indicates the coordinate of the feature point.

The extraction region in the surroundings of the target block that is ofthe feature point extraction target, the number of feature points usedto generate the prediction image, and the information designating thefeature point used are decoded through these processes.

The feature point designating information is not necessarily thecoordinate information on the feature point, but any information capableof uniquely identifying the feature point may be used as the featurepoint designating information. For example, the feature pointdesignating information may indicate the order when the feature pointextracted in the extraction region are ranked using an edge intensity ora luminance component. For the use of the order, the coding quantity canbe reduced because the feature quantity can be designated byone-dimensional information and a small numeric character.

The prediction block generation process in Step S404 will be describedin detail below with reference to a flowchart in FIG. 27.

Image decoding apparatus 200 determines which one of the intraprediction and the inter prediction is used in the prediction blockgeneration process (S501). Similarly to Step S421, the determination inStep S501 is performed using the information included in bitstream 221.

When the intra prediction is used (“INTRA” in S501), image decodingapparatus 200 generates the prediction block through the intraprediction process (S502). On the other hand, when the inter predictionis used (“INTER” in S501), image decoding apparatus 200 generates theprediction block through the inter prediction process (S503).

The detailed intra prediction process in Step S502 will be describedbelow with reference to a flowchart in FIG. 28.

Image decoding apparatus 200 determines whether the prediction mode inthe intra prediction is the usual intra prediction or the feature pointuse intra prediction (S521). Similarly to Step S422, the determinationin Step S521 is performed using the information included in bitstream221.

When the prediction mode is the usual intra prediction (“USUAL” inS521), image decoding apparatus 200 generates the prediction image byperforming the intra prediction process used in H.265/HEVC (S522). Onthe other hand, when the prediction mode is the feature point use intraprediction (“FEATURE POINT USE” in S521), image decoding apparatus 200generates the prediction image by the feature point use intra prediction(S523).

The prediction image generation process by the feature point use intraprediction in Step S523 will be described in detail below with referenceto a flowchart in FIG. 29.

Image decoding apparatus 200 determines whether a feature point use modeis the in-block mode or the surrounding block mode (S541). Similarly toStep S441, the determination in Step S541 is performed using theinformation included in bitstream 221.

When the feature point use mode is the in-block mode (“IN-BLOCK” inS541), image decoding apparatus 200 generates the prediction image inthe in-block mode that is of the intra prediction process in which thefeature point in the target block is used (S542).

On the other hand, when the feature point use mode is the surroundingblock mode (“SURROUNDING BLOCK” in S541), image decoding apparatus 200generates the prediction image in the surrounding block mode that is ofthe intra prediction process in which the feature point informationincluded in the region in the surroundings of the target block is used(S543).

The prediction image generation process in the in-block mode of StepS542 will be described in detail below with reference to a flowchart inFIG. 30.

Image decoding apparatus 200 searches the corresponding point in thefeature points extracted in Step S406 using the in-block modefeature-point-related information decoded in Step S442 (S561).

Specifically, image decoding apparatus 200 identifies the correspondingpoint from the plurality of feature points extracted in Step S406 usingthe corresponding point information (for example, the coordinate of thecorresponding point) that is included in the feature-point-relatedinformation to identify the corresponding point. In the case that thecorresponding point information is the index, in a manner similar to thecoding side, image decoding apparatus 200 allocates the index to theplurality of feature points to set the feature point having the indexindicated by the corresponding point information to the correspondingpoint.

Image decoding apparatus 200 extracts at least one feature pointincluded in the target block and the feature quantity of the featurepoint, and decides the feature point used from the extracted featurepoints using the feature point information (for example, the coordinateof the feature point) that is included in the feature-point-relatedinformation to identify the feature point. In the case that the featurepoint information indicates the rotation quantity and the scale value,image decoding apparatus 200 sets the feature point having the rotationquantity and scale value, which are indicated by the feature pointinformation, to the feature point.

Using the feature point and corresponding point, which are obtained inStep S561, image decoding apparatus 200 performs the process similar tothat in Step S185 to generate the prediction image (S562).

The prediction image generation process in the surrounding block mode ofStep S543 will be described in detail below with reference to aflowchart in FIG. 31.

Image decoding apparatus 200 extracts the feature point, which isincluded in the extraction region obtained from the extraction rangeinformation decoded in Step S481, from the feature point extracted inStep S406. Image decoding apparatus 200 extracts the number informationdecoded in Steps S482 to S484 and the feature point designated by thefeature point designating information from the feature point included inthe extraction region (S581).

Image decoding apparatus 200 searches the corresponding point of thefeature point extracted in Step S581 from the feature point group of thereference image extracted in Step S406 (S582). That is, similarly to thecoding side, image decoding apparatus 200 searches the correspondingpoint similar to the feature quantity of the feature point in thesurroundings of the target block using the feature quantity.

Using the corresponding point information obtained in Step S582, imagedecoding apparatus 200 generates the prediction image through theprocess similar to that in Step S225 (S583).

The inter prediction process in Step S503 will be described in detailbelow with reference to a flowchart in FIG. 32.

Image decoding apparatus 200 determines whether the prediction mode ofthe inter prediction is the usual inter prediction or the feature pointuse inter prediction (S601). Similarly to Step S423, the determinationin Step S601 is performed using the information included in bitstream221.

When the prediction mode is the usual inter prediction (“USUAL” inS601), image decoding apparatus 200 generates the prediction image byperforming a motion information use prediction image generation process,such as an inter prediction technique, which is used in H.265/HEVC(S602). At this point, the motion information including high-orderinformation such as an affine transform matrix and a projectiontransform matrix may be used, although the parallel translation vectoris used as the motion information in H.265/HEVC.

On the other hand, when the prediction mode is the feature point useinter prediction (“FEATURE POINT USE” in S601), image decoding apparatus200 generates the prediction image by the feature point use interprediction (S603).

The prediction image generation process by the feature point use intraprediction in Step S603 will be described in detail below with referenceto a flowchart in FIG. 33.

Image decoding apparatus 200 determines whether the feature point usemode is the in-block mode or the surrounding block mode (S621).Similarly to Step S441, the determination in Step S621 is performedusing the information included in bitstream 221.

When the feature point use mode is the in-block mode (“IN-BLOCK” inS621), image decoding apparatus 200 generates the prediction image inthe in-block mode that is of the inter prediction process in which thefeature point in the target block is used (S622).

On the other hand, when the feature point use mode is the surroundingblock mode (“SURROUNDING BLOCK” in S621), image decoding apparatus 200generates the prediction image in the surrounding block mode that is ofthe inter prediction process in which the feature point informationincluded in the region in the surroundings of the target block is used(S623).

The in-block mode prediction image generation process in Step S622 willbe described in detail below with reference to a flowchart in FIG. 34.

Image decoding apparatus 200 searches the corresponding point in thefeature points extracted in Step S409 using the in-block modefeature-point-related information decoded in Step S442 (S641). Thedetailed process in Step S641 is similar to that in Step S561 exceptthat a reference destination is another already-coded picture.

Using the feature point and corresponding point, which are obtained inStep S641, image decoding apparatus 200 performs the process similar tothat in Step S305 to generate the prediction image (S642).

The prediction image generation process in the surrounding block mode of

Step S623 will be described in detail below with reference to aflowchart in FIG. 35.

Image decoding apparatus 200 extracts the feature point, which isincluded in the extraction region obtained from the extraction rangeinformation decoded in Step S481, from the feature point extracted inStep S409. Image decoding apparatus 200 extracts the number informationdecoded in Steps S482 to S484 and the feature point designated by thefeature point designating information from the feature point included inthe extraction region (S661).

Image decoding apparatus 200 searches the corresponding point of thefeature point extracted in Step S661 from the feature point group of thereference image extracted in Step S409 (S662).

Using the corresponding point information obtained in Step S662, imagedecoding apparatus 200 generates the prediction image through theprocess similar to that in Step S325 (S663).

Advantageous Effect

As described above, according to the third exemplary embodiment, thebitstream in which the pieces of information on the feature point useintra and inter prediction techniques are coded can be decoded.According to the configuration of the third exemplary embodiment, inconsideration of the use of the feature point, the bitstream can bedecoded through the prediction image generation process in which thecorresponding point of the feature point is used, and the higher-qualityimage can be played back.

As described in the first exemplary embodiment, various featurequantities can be used as the local feature quantity. ORB or SURF thatis of another technology concerning the local feature quantity may beused. The feature point extraction technique may be different from thefeature quantity calculation technique.

As described in the first exemplary embodiment, only one feature pointin the target block is not necessarily used in the processes in StepsS561 and S641, but the plurality of feature points may be used. At thispoint, in the decoding process in Step S442, a process of decoding theinformation on the number of coded feature points is added, and theprocesses in Steps S461 to S464 are repeatedly performed times for thenumber of coded feature points, which allows the performance of theprocess in which the plurality of feature points are used. The use ofthe plurality of feature points improves the accuracy of the predictionimage. Therefore, a residual component decreases to improve the codingefficiency.

In the first exemplary embodiment, by way of example, the abovetechnique is used in both the inter prediction and the intra prediction.Alternatively, the above technique may be used only in one of the interprediction and the intra prediction. In the first exemplary embodiment,both the prediction process (inter prediction or intra prediction) inwhich the feature point in the target block is used and the predictionprocess (inter prediction or intra prediction) in which the featurepoint in the surroundings of the target block is used are described byway of example. Alternatively, only one of the prediction process (interprediction or intra prediction) in which the feature point in the targetblock is used and the prediction process (inter prediction or intraprediction) in which the feature point in the surroundings of the targetblock is used may be used.

In the above description, the corresponding point information indicatingthe corresponding point is transmitted onto the decoding side during thein-block mode, but the corresponding point information is nottransmitted onto the decoding side during the surrounding block mode.Alternatively, the corresponding point information may be transmittedonto the decoding side during the surrounding block mode. In this case,image decoding apparatus 200 does not perform the corresponding pointsearch process in which the feature quantity is used, but identifies thecorresponding point from the plurality of feature points using thecorresponding point information.

In the in-block mode, the corresponding point information needs not tobe transmitted onto the decoding side. In this case, similarly to thecoding side, image decoding apparatus 200 identifies the correspondingpoint by performing the corresponding point search process in which thefeature quantity is used.

As described above, the image coding method and image decoding method ofthe third exemplary embodiment include the prediction image generationmethod in FIG. 36.

The prediction image generation apparatus of the third exemplaryembodiment generates the prediction image of the target block. In theprediction image generation apparatus, the plurality of first featurepoints each of which is included in the reconstructed image to have thelocal feature quantity is extracted (S701). As used herein, thereconstructed image means the already-coded or already-decoded blockincluded in the target picture including the target block in the intraprediction, and means the already-coded or already-decoded picturedifferent from the target picture in the inter prediction.

In the image coding apparatus including the prediction image generationapparatus, the plurality of third feature points corresponding to thetarget block are extracted. Specifically, the third feature point is thefeature point included in the target block in the in-block mode, and isthe feature point, which is not included in the target block but locatedin the surroundings of the target block, in the surrounding block mode.

Then the prediction image generation apparatus searches thecorresponding point in the plurality of first feature points. Thecorresponding point has the local feature quantity similar to that ofthe second feature point corresponding to the target block, and arelationship with the second feature point is expressed by informationincluding the non-parallel translation component (S702). Specifically,the second feature point is the feature point included in the targetblock in the in-block mode, and is the feature point, which is notincluded in the target block but located in the surroundings of thetarget block, in the surrounding block mode.

Specifically, in the image coding apparatus, the second feature point isselected from the plurality of third feature points corresponding to thetarget block. The image coding apparatus searches the correspondingpoint having the local feature quantity similar to that of the secondfeature point from the plurality of first feature points using thesimilarity of the local feature quantity. The image coding apparatuscodes the feature point information identifying the second feature pointin the plurality of third feature points, and transmits the codedfeature point information to the image decoding apparatus. The imagecoding apparatus may code the corresponding point informationidentifying the corresponding point in the plurality of first featurepoints.

On the other hand, the feature point information is decoded in the imagedecoding apparatus. The image decoding apparatus extracts the pluralityof third feature points corresponding to the target block, and selectsthe second feature point from the plurality of third feature pointsusing the feature point information. Similarly to the coding side, theimage decoding apparatus searches the corresponding point having thelocal feature quantity similar to that of the second feature point fromthe plurality of first feature points using the similarity of the localfeature quantity. In the case that the corresponding point informationis included in the bitstream, the image decoding apparatus decodes thecorresponding point information, and searches the corresponding pointfrom the plurality of first feature points using the information on thedecoded corresponding point.

Then the prediction image generation apparatus generates the predictionimage from the reconstructed image based on the relationship between thesecond feature point and the corresponding point (S703). Specifically,using the pixel value in the surroundings of the corresponding point inthe reconstructed image, the prediction image generation apparatusgenerates the prediction image based on the above relationship. Forexample, in the in-block mode, prediction image generation apparatusgenerates the prediction image using the pixel value in the regionincluding the corresponding point in the reconstructed image. In thesurrounding block mode, the prediction image generation apparatusgenerates the prediction image using the pixel value in the region thatdoes not include the corresponding point in the reconstructed image.

The image coding apparatus does not generate the prediction image usingthe pixel value in the surroundings of the corresponding point, but theimage coding apparatus may set the initial value of the motionestimation process based on the relationship between the second featurepoint and the corresponding point as described in the second exemplaryembodiment, and generate the prediction image by performing the motionestimation process using the initial value.

The image coding apparatus codes the target block using the generatedprediction image, and the image decoding apparatus decodes the targetblock using the generated prediction image.

Although the prediction image generation method, the image codingmethod, and the image decoding method of the third exemplary embodimentare described above, the present disclosure is not limited to the thirdexemplary embodiment.

Also, individual processors included in the prediction image generationapparatus, image coding apparatus, and image decoding apparatus of thethird exemplary embodiment are typically implemented as an LSI(Large-Scale Integration) that is of an integrated circuit. Theseprocessors may be formed as separate chips, or some or all of theprocessors may be included in a chip.

Also, the circuit integration is not limited to LSI, and may beimplemented using a dedicated circuit or general-purpose processor. AnFPGA (Field Programmable Gate Array) that is programmable aftermanufacturing of an LSI or a reconfigurable processor in whichconnections and settings of circuit cells within the LSI arereconfigurable may be used.

In each of the first to third exemplary embodiments, individualcomponents may be implemented with dedicated hardware or by executing asoftware program suitable for the components. The individual componentsmay be implemented as a result of a program execution unit such as a CPU(Central processor) or processor loading and executing a softwareprogram recorded on a recording medium, such as a hard disk or asemiconductor memory.

In other words, the prediction image generation apparatus, the imagecoding apparatus, and the image decoding apparatus include theprocessing circuitry and storage electrically connected to theprocessing circuitry (accessible from the processing circuitry). Theprocessing circuitry includes at least one of dedicated hardware and theprogram execution unit. Also, the storage stores a software program tobe executed by the program execution unit in the case where theprocessing circuitry includes the program execution unit. The processingcircuitry performs the prediction image generation method, the imagecoding method, or the image decoding method of the first to thirdexemplary embodiments using the storage.

The present disclosure may be implemented by a software program or anon-transitory computer-readable recording medium in which the programis recorded. The program can be distributed through a transmissionmedium such as the Internet.

Also, numerical values used above are merely illustrative ones used todescribe the present disclosure specifically, and thus the presentdisclosure is not limited to the illustrative numerical values.

Also, how functional blocks are divided in block diagrams is merely anexample, and thus a plurality of functional blocks may be implemented asone functional block, one functional block may be divided into aplurality of functional blocks, or part of the function may betransferred to another functional block. Also, functions of a pluralityof functional blocks having similar functions may be processed inparallel or in a time-divided manner by using hardware or software.

The order in which Steps included in the above prediction imagegeneration method, image coding method, or image decoding method areexecuted is merely an illustrative one used to describe the presentdisclosure specifically, and thus Steps may be executed in an orderother than the above one. Also, some of the Steps may be executedsimultaneously (in parallel) with another Step.

While the prediction image generation apparatus, the image codingapparatus, and the image decoding apparatus are described above based onthe exemplary embodiments of the present disclosure, the presentdisclosure is not limited to these exemplary embodiments. Exemplaryembodiments in which various modifications conceivable by a personskilled in the art are made and exemplary embodiments that are made bycombining elements of different exemplary embodiments may also be withinthe scope of the one or the plurality of exemplary embodiments of thepresent disclosure as long as such exemplary embodiments do not departfrom the gist of the present disclosure.

Fourth Exemplary Embodiment

The processes described in each of the first to third exemplaryembodiments above can be implemented easily in a standalone computersystem by recording a program for implementing the configuration of avideo coding method (image coding method) or video decoding method(image decoding method) described in the exemplary embodiment on astorage medium. The storage medium may be any given type of mediumcapable of storing the program, such as a magnetic disk, an opticaldisc, a magneto-optical disk, an IC (Integrated Circuit) card, or asemiconductor memory.

Now, exemplary applications of the video coding method (image codingmethod) or the video decoding method (image decoding method) describedin each of the first to third exemplary embodiments and systems usingthem will be further described. The systems include an imagecoding/decoding apparatus which includes an image coding apparatus thatemploys the image coding method and an image decoding apparatus thatemploys the image decoding method. Other configurations of the systemscan be changed as appropriate in accordance with the circumstances.

FIG. 37 is a view illustrating a whole configuration of content supplysystem ex100 that implements content distribution services. An area inwhich communication services are provided is divided into cells of adesired size. Base stations ex106, ex107, ex108, ex109, and ex110, whichare fixed wireless stations, are installed in the respective cells.

In content supply system ex100, various devices, such as computer exlll,PDA (Personal Digital Assistant) ex112, camera ex113, mobile phoneex114, game machine ex115 are connected to Internet ex101 throughInternet service provider ex102, telephone network ex104, and basestations ex106 to ex110.

The configuration of content providing system ex100 is not limited tothe configuration illustrated in FIG. 37, but any given combination ofthe elements may be connected. The individual device may directly beconnected to telephone network ex104 with no use of base stations ex106to ex110 which are fixed wireless stations. Alternatively, theindividual devices may directly be interconnected through near fieldcommunication or the like.

Camera ex113 is a device capable of capturing video images, such as adigital camcorder. Camera ex116 is a device capable of capturing stillimages and video images, such as a digital camera. Also, mobile phoneex114 may be any of a mobile phone based on the GSM (registeredtrademark) (Global System for Mobile Communications) scheme, CDMA (CodeDivision Multiple Access) scheme, W-CDMA (Wideband-Code DivisionMultiple Access) scheme, LTE (Long Term Evolution) scheme, or HSPA (HighSpeed Packet Access) scheme; a PHS (Personal Handyphone System); and soforth.

In content providing system ex100, camera ex113 or the like is connectedto streaming server ex103 through base station ex109 and telephonenetwork ex104. In this way, live streaming is implemented. During livestreaming, the coding process is performed on content (for example,video of a music event) obtained by the user using camera ex113 in amanner as described in each of the above exemplary embodiments (that is,camera ex113 functions as an image coding apparatus according to oneaspect of the present disclosure) and the resulting content istransmitted to streaming server ex103. Streaming server ex103 in turndistributes the received content as a stream to a client that has made arequest. Examples of the client include computer exlll, PDA ex112,camera ex113, mobile phone ex114, and game machine ex115 capable ofdecoding the data that has undergone the coding process. Each devicethat has received the distributed data performs the decoding process onthe received data to play back the data (that is, the device functionsas an image decoding apparatus according to one aspect of the presentdisclosure).

Note that the coding process may be performed on the obtained data bycamera ex113, by streaming server ex103 that performs a datatransmission process, or by both of them on a processing-sharing basis.Similarly, the decoding process may be performed on the distributed databy the client, by streaming server ex103, or by both of them on aprocessing-sharing basis. Also, in addition to still and/or video imagedata obtained by camera ex113, still and/or video image data obtained bycamera ex116 may be transmitted to streaming server ex103 throughcomputer exlll. In this case, the coding process may be performed by anyof camera ex116, computer exlll, and streaming server ex103, or by allof them on a processing-sharing basis.

These coding and decoding processes are performed in general by LSIex500 included in computer exlll or each device. LSI ex500 may be formedas a 1 chip or a plurality of chips. Alternatively, software for videocoding/decoding may be recorded on a recording medium (such as a CD-ROM,a flexible disk, and a hard disk) that is readable by computer exlll orthe like, and the coding and decoding processes may be performed usingthe software. Further, in the case where mobile phone ex114 is equippedwith a camera, video image data obtained with the camera may betransmitted. The video image data is data that has been coded by LSIex500 included in mobile phone ex114.

Also, streaming server ex103 may be constituted by a plurality ofservers or a plurality of computers that process, record, and distributedata in a distributed manner.

In the above-described manner, content providing system ex100 allows theclient to receive and play back coded data. Accordingly, contentproviding system ex100 allows the client to receive, decode, and playback information transmitted by a user in real time, and thus allows auser not having a special right or equipment to implement personalbroadcasting.

In addition to the example of content providing system ex100, at leastone of the video coding apparatus (image coding apparatus) and the videodecoding apparatus (image decoding apparatus) according to each of theabove exemplary embodiments can be incorporated in digital broadcastingsystem ex200 as illustrated in FIG. 38. Specifically, broadcastingstation ex201 transmits a radio wave of multiplexed data obtained bymultiplexing video data, music data, and the like, through communicationto broadcasting satellite ex202. This video data is data coded using thevideo coding method described in each of the above exemplary embodiments(that is, data coded by the image coding apparatus according to oneaspect of the present disclosure). Upon receipt of this data,broadcasting satellite ex202 transmits a broadcasting radio wave, andhome antenna ex204 capable of receiving satellite broadcasting receivesthis radio wave. An apparatus such as television (receiver) ex300 or settop box (STB) ex217 decodes and plays back the received multiplexed data(that is, the apparatus functions as the image decoding apparatusaccording to one aspect of the present disclosure).

Also, the video decoding apparatus or video coding apparatus describedin each of the above exemplary embodiments can be implemented inreader/recorder ex218 that reads and decodes the multiplexed datarecorded on recording medium ex215 such as a DVD (Digital VersatileDisc) or a BD (Blu-ray Disc); or that codes a video signal and furthermultiplexes a music signal with the video signal depending oncircumstances, and writes the resulting signal on recording mediumex215. In this case, the played-back video signal is displayed onmonitor ex219, and the video signal can be played back by anotherapparatus or system using recording medium ex215 having the multiplexeddata recorded thereon. Alternatively, the video decoding apparatus maybe implemented in set top box ex217 connected to cable ex203 for cabletelevision or home antenna ex204 for satellite/terrestrial broadcasting,and the video signal may be displayed on monitor ex219 of televisionex300. At this time, the video decoding apparatus may be incorporatedinto television ex300 instead of set top box ex217.

FIG. 39 is a view illustrating television (receiver) ex300 that employsthe video decoding method and video coding method described in each ofthe above exemplary embodiments. Television ex300 includes tuner ex301that obtains or outputs, through antenna ex204 or cable ex203 thatreceives broadcasting, multiplexed data in which video data and audiodata are multiplexed together; modulator/demodulator ex302 that performsdemodulation on the received multiplexed data or modulation onmultiplexed data to be transmitted to outside; andmultiplexer/demultiplexer ex303 that demultiplexes the demodulatedmultiplexed data into video data and audio data, or multiplexes videodata and audio data that have been coded by signal processor ex306.

Television ex300 also includes signal processor ex306 and output unitex309. Signal processor ex306 includes audio signal processor ex304 thatdecodes or codes audio data, and video signal processor ex305 thatdecodes or codes video data (video signal processor ex305 functions asthe image coding apparatus or the image decoding apparatus according toone aspect of the present disclosure). Output unit ex309 includesspeaker ex307 that outputs the decoded audio signal, and display unitex308, such as a display, which displays the decoded video signal.Television ex300 further includes interface unit ex317 which includesoperation input unit ex312 that accepts input of a user operation.Television ex300 further includes controller ex310 that controls theindividual units in an integrated manner, and power supply circuit unitex311 that supplies electric power to the individual units. Interfaceunit ex317 may include bridge ex313 to be connected to an externaldevice, such as reader/recorder ex218; slot unit ex314 that enablesconnection of recording medium ex216 such as an SD card; driver ex315for connection to external recording medium ex215, such as a hard disk;and modem ex316 for connection to telephone network ex104 as well asoperation input unit ex312. Recording medium ex216 is capable ofelectrically storing information by using a nonvolatile/volatilesemiconductor memory included therein. The individual units oftelevision ex300 are connected to one another through a synchronizationbus.

A configuration that allows television ex300 to decode and play backmultiplexed data obtained from outside with antenna ex204 or the likewill be described. Television ex300 receives a user operation fromremote control ex220 or the like. Based on control performed bycontroller ex310 including a CPU or the like, multiplexer/demultiplexerex303 demultiplexes multiplexed data that has been demodulated bymodulator/demodulator ex302. In television ex300, audio signal processorex304 decodes the separated audio data and video signal processor ex305decodes the separated video data using the image decoding methoddescribed in each of the above exemplary embodiments. Further, thedecoded audio signal and video signal are output to outside from outputunit ex309. When the audio signal and the video signal are output, thesesignals may be temporarily stored in buffers ex318 and ex319 or the likeso that they are played back in synchronization with each other.Television ex300 may read multiplexed data from recording mediums ex215and ex216 such as a magnetic/optical disk and an SD card as well as frombroadcasting. Next, a configuration that allows television ex300 to codean audio signal and a video signal and to transmit the resulting signalsto outside or write the resulting signals on a recording medium or thelike will be described. Television ex300 receives a user operation fromremote control ex220 or the like. Based on control performed bycontroller ex310, audio signal processor ex304 codes the audio signal,and video signal processor ex305 codes the video signal using the imagecoding method described in each of the above exemplary embodiments. Thecoded audio signal and video signal are multiplexed bymultiplexer/demultiplexer ex303 and the resulting multiplexed signal isoutput to outside. When the audio signal and the video signal aremultiplexed, these signals may be temporarily stored in buffers ex320and ex321 or the like so that they are synchronized with each other. Aplurality of buffers may be provided as illustrated as buffers ex318,ex319, ex320, and ex321, or one or more buffers may be shared. Further,in addition to the illustrated buffers, for example, data may be storedin a buffer that serves as a buffering member for avoiding an overflowor underflow in the system between modulator/demodulator ex302 andmultiplexer/demultiplexer ex303 or the like.

Television ex300 may also include a configuration for receivingaudio/video input of a microphone or a camera in addition to theconfiguration for obtaining audio data and video data from broadcasting,a recording medium, or the like; and may perform the coding process onthe data obtained therefrom. Although television ex300 has beendescribed as the configuration capable of performing the above-describedcoding process, multiplexing, and outputting to outside, televisionex300 may be a configuration incapable of performing these processes andonly capable of the reception, decoding process, and outputting tooutside.

In the case where multiplexed data is read from and written to arecording medium by reader/recorder ex218, the decoding process or thecoding process may be performed by television ex300, by reader/recorderex218, or by both television ex300 and reader/recorder ex218 on aprocessing-sharing basis.

FIG. 40 illustrates an example of a configuration of informationplayback/recording unit ex400 in the case that the data is read orwritten from and in an optical disk. Information playback/recording unitex400 includes optical head ex401, modulation recorder ex402, playbackdemodulator ex403, buffer ex404, disk motor ex405, servo controllerex406, and system controller ex407. Optical head ex401 irradiates arecording surface of recording medium ex215, which is an optical disc,with a laser spot to write information thereon; and detects reflectedlight from the recording surface of recording medium ex215 to readinformation. Modulation recorder ex402 electrically drives asemiconductor laser included in optical head ex401 to modulate a laserbeam in accordance with to-be-recorded data. Playback demodulator ex403amplifies a played-back signal which is obtained by electricallydetecting reflected light from the recording surface by a photodetectorincluded in optical head ex401, separates and demodulates signalcomponents recorded on recording medium ex215, and plays back necessaryinformation. Buffer ex404 temporarily stores information to be recordedon recording medium ex215 and information played back from recordingmedium ex215. Disk motor ex405 rotates recording medium ex215. Servocontroller ex406 moves optical head ex401 to a certain information trackwhile controlling rotational driving of disk motor ex405 to perform alaser spot tracking process. System controller ex407 controlsinformation playback/recording unit ex400. The above-described readingand writing processes are implemented as a result of system controllerex407 performing recording/playback of information through optical headex401 while causing modulation recorder ex402, playback demodulatorex403, and servo controller ex406 to operate in cooperation with oneanother and using various pieces of information held in buffer ex404 andgenerating/adding new information as needed. System controller ex407includes, for example, a microprocessor and performs these processes byexecuting a read/write program.

Although optical head ex401 that irradiates the recording surface with alaser spot has been described above, optical head ex401 may include aconfiguration for performing high-density recording using near fieldlight.

FIG. 41 is a schematic diagram of recording medium ex215 which is anoptical disc. On the recording surface of recording medium ex215, aguide groove (groove) is spirally formed. In information track ex230,address information that represents an absolute position on the disk ispre-recorded by a change in the shape of the groove. This addressinformation includes information identifying positions of recordingblocks ex231 which are units in which data is recorded. An apparatusthat performs the recording and the playback can identify a recordingblock by playing back information track ex230 and reading the addressinformation. Also, recording medium ex215 includes data recording areaex233, inner circumference area ex232, and outer circumference areaex234. Data recording area ex233 is an area used to record user data.Inner circumference area ex232 and outer circumference area ex234, whichare located on the inner and the outer sides of data recording areaex233, are used in a specific application other than recording of theuser data. Information playback/recording unit ex400 performsreading/writing of coded audio data, coded video data, or multiplexeddata of these pieces of data on data recording area ex233 of recordingmedium ex215 thus configured.

The description has been given using a one-layer optical disk such as aDVD or BD by way of example above, but the optical disk used is notlimited to such a disk and may be a multi-layered optical disk for whichrecording can be performed on part other than the surface.Alternatively, the optical disk used may be an optical disk on whichmulti-dimensional recording/playback can be performed by recordinginformation at the same position of the disk using light of variouswaveforms different from one another, by recording information ondifferent layers at various angles, or the like.

In addition, in digital broadcasting system ex200, the data may bereceived by vehicle ex210 equipped with antenna ex205 from broadcastingsatellite ex202 or the like and the video image may be played back on adisplay device of car navigation system ex211 mounted on vehicle ex210.It is conceivable that the configuration illustrated in FIG. 39additionally including a GPS reception unit is conceivable as theconfiguration of car navigation system ex211, and the same applies tocomputer exlll, mobile phone ex114, or the like.

FIG. 42A is a view illustrating mobile phone ex114 that employs thevideo decoding method and the video coding method described in the aboveexemplary embodiments. Mobile phone ex114 includes antenna ex350 thattransmits and receives a radio wave to and from base station ex110;camera unit ex365 capable of capturing video and still images; anddisplay unit ex358, such as a liquid crystal display, that displays thevideo captured by camera unit ex365 and data obtained by decoding videoor the like received with antenna ex350. Mobile phone ex114 furtherincludes a body including operation key unit ex366; audio output unitex357 such as a speaker for outputting audio; audio input unit ex356such as a microphone for inputting audio; memory unit ex367 that storescoded data or decoded data of captured video, captured still images,recorded audio, received video, received still images, or receivedemails; and slot unit ex364 which is an interface to a recording mediumwhich similarly stores data thereon.

A configuration example of mobile phone ex114 will be described withreference to FIG. 42B. Mobile phone ex114 includes main controller ex360that controls individual units of the body which includes display unitex358 and operation key unit ex366 in an integrated manner. Mobile phoneex114 also includes power supply circuit unit ex361, operation inputcontroller ex362, video signal processor ex355, camera interface unitex363, LCD (Liquid Crystal Display) controller ex359,modulator/demodulator ex352, multiplexer/demultiplexer ex353, audiosignal processor ex354, slot unit ex364, and memory unit ex367 which areconnected to main controller ex360 through bus ex370.

When an on-hook/power key is turned on through a user operation, powersupply circuit unit ex361 supplies electric power to individual unitsfrom a battery pack to activate mobile phone ex114 into an operablestate.

In mobile phone ex114, in a voice call mode, audio signal processorex354 converts an audio signal obtained by audio input unit ex356 into adigital audio signal, modulator/demodulator ex352 performs spreadspectrum processing on this digital audio signal, andtransmitter/receiver ex351 performs digital-to-analog conversionprocessing and frequency conversion processing on this signal and thentransmits the resulting signal through antenna ex350 in accordance withcontrol performed by main controller ex360 which includes a CPU, a ROM,and a RAM. Also, in mobile phone ex114, in the voice call mode,transmitter/receiver ex351 amplifies reception data received throughantenna ex350 and performs frequency conversion processing andanalog-to-digital conversion processing, modulator/demodulator ex352performs spread spectrum processing on the resulting signal, audiosignal processor ex354 converts the resulting signal into an analogaudio signal. The analog audio signal is then output from audio outputunit ex357.

In the case where an email is transmitted in a data communication mode,text data of the email input through operation of operation key unitex366 of the body or the like is sent to main controller ex360 throughoperation input controller ex362. Main controller ex360 performs controlsuch that modulator/demodulator ex352 performs spread spectrumprocessing on the text data and transmistter/receiver ex351 performsdigital-to-analog conversion processing and frequency conversionprocessing on the text data and then transmits the resulting text datato base station ex110 through antenna ex350. In the case of receiving anemail, substantially the opposite processing is performed on thereceived data, and the resulting text data is output to display unitex358.

In the case where video, a still image, or a combination of video andaudio are transmitted in the data communication mode, video signalprocessor ex355 compresses and codes a video signal supplied from cameraunit ex365 using the video coding method described in each of the aboveexemplary embodiments (that is, video signal processor ex355 functionsas the image coding apparatus according to one aspect of the presentdisclosure), and sends the coded video data to multiplexer/demultiplexerex353. Also, audio signal processor ex354 codes an audio signal obtainedby audio input unit ex356 while the video, still image, or the like isbeing captured by camera unit ex365, and sends the coded audio data tomultiplexer/demultiplexer ex353.

Multiplexer/demultiplexer ex353 multiplexes the coded video datasupplied from video signal processor ex355 and the coded audio datasupplied from audio signal processor ex354 in accordance with a certainscheme. Modulator/demodulator (modulation/demodulation circuit unit)ex352 performs spread spectrum processing on the resulting multiplexeddata. Transmitter/receiver ex351 performs digital-to-analog conversionprocessing and frequency conversion processing on the multiplexed data,and then transmits the resulting data through antenna ex350.

In the case that data of a video file linked to a website or the like isreceived in the data communication mode, multiplexer/demultiplexer ex353demultiplexes multiplexed data into a bitstream of video data and abitstream of audio data in order to decode the multiplexed data receivedthrough antenna ex350. Multiplexer/demultiplexer ex353 supplies thecoded video data to video signal processor ex355 and the coded audiodata to audio signal processor ex354 through synchronization bus ex370.Video signal processor ex355 performs decoding using a video decodingmethod corresponding to the video coding method described in each of theabove exemplary embodiments to decode the video signal (that is, videosignal processor ex355 functions as the image decoding apparatusaccording to one aspect of the present disclosure). Then, for example,video or still image included in the video file linked to the website isdisplayed on display unit ex358 through LCD controller ex359. Audiosignal processor ex354 decodes the audio signal, and the resulting audiois output by audio output unit ex357.

Like television ex300, three implementation forms, that is, atransmission/reception terminal including both an encoder and a decoder,a transmission terminal only including an encoder, and a receptionterminal only including a decoder, are conceivable for a terminal suchas mobile phone ex114. Further, the case has been described in whichmultiplexed data in which video data, audio data, and so forth aremultiplexed is received and transmitted in digital broadcasting systemex200; however, the multiplexed data may be data in which text datarelated to the video is multiplexed other than audio data or video dataalone may be used instead of the multiplexed data.

As described above, the video coding method or video decoding methoddescribed in each of the above exemplary embodiments is applicable toany of the aforementioned devices and systems. In such a way, advantagesdescribed in each of the above exemplary embodiments can be obtained.

Also, the present disclosure is not limited to the exemplary embodimentsabove, and various modifications and corrections can be made withoutdeparting from the scope of the present disclosure.

Fifth Exemplary Embodiment

Video data can also be generated by switching between the video codingmethod or apparatus described in each of the above exemplary embodimentsand a video coding method or apparatus based on a different standard,such as MPEG-2, MPEG-4 AVC, or VC-1 as appropriate.

In the case where a plurality of pieces of video data based on differentstandards are generated, a decoding method corresponding to each of thestandards needs to be selected at the time of decoding. However, becausewhich standard the to-be-decoded video data is based on is notidentifiable, it is challenging to select an appropriate decodingmethod.

To deal with such a challenge, multiplexed data in which audio data orthe like is multiplexed with video data is configured to includeidentification information that indicates which standard the video datais based on. A specific structure of multiplexed data including videodata that is generated using the video coding method or apparatusdescribed in each of the above exemplary embodiments will be describedbelow. Multiplexed data is a digital stream in the MPEG-2 transportstream format.

FIG. 43 is a view illustrating a structure of multiplexed data. Asillustrated in FIG. 43, the multiplexed data is obtained by multiplexingat least one of a video stream, an audio stream, a presentation graphicsstream (PG), and an interactive graphics stream. The video streamrepresents a main video and a sub video of a movie. The audio stream(IG) represents a main audio part of the movie and sub audio to be mixedwith the main audio. The presentation graphics stream represents thesubtitle of the movie. Here, the main video refers to a video usuallydisplayed on a window, whereas the sub video refers to a video displayedwithin the main video as a small window. The interactive graphics streamrepresents a dialog window created by placing GUI components on thewindow. The video stream is coded using the video coding method orapparatus described in each of the above exemplary embodiments and usingthe video coding method or apparatus pursuant to existing standards suchas MPEG-2, MPEG-4 AVC, and VC-1. The audio stream is coded usingstandards such as Dolby AC-3 (Audio Code number 3), Dolby Digital Plus,MLP (Meridian Lossless Packing), DTS (Digital Theater Systems), DTS-HD,and linear PCM (Pulse Code Modulation).

Each stream included in multiplexed data is identified by a PID (PacketIdentifier). For example, 0x1011 is assigned to a video stream to beused as video of a movie. Any one of 0x1100 to 0x111F is assigned to anaudio stream. Any one of 0x1200 to 0x121F is assigned to a presentationgraphics stream. Any one of 0x1400 to 0x141F is assigned to aninteractive graphics stream. Any one of 0x1B00 to 0x1B1F is assigned toa video stream to be used as sub video of the movie. Any one of 0x1A00to 0x1A1F is assigned to an audio stream to be used as sub audio to bemixed with main audio.

FIG. 44 is a view schematically illustrating how individual streams aremultiplexed into multiplexed data. Video stream ex235 made up of aplurality of video frames and audio stream ex238 made up of a pluralityof audio frames are converted into PES (Packetized Elementary Stream)packet sequences ex236 and ex239, and then into TS (Transport Stream)packets ex237 and ex240, respectively. Likewise, data of presentationgraphics stream ex241 and data of interactive graphics stream ex244 areconverted into PES packet sequences ex242 and ex245, and further into TSpackets ex243 and ex246, respectively. Multiplexed data ex247 is formedby multiplexing these TS packets into one stream.

FIG. 45 illustrates how a video stream is stored in a PES packetsequence in detail. The upper row in FIG. 45 illustrates a video framesequence of the video stream. The lower row illustrates a PES packetsequence. As denoted by arrows yy1, yy2, yy3, and yy4 in FIG. 45,I-pictures, B-pictures, and P-pictures which are a plurality of videopresentation units in a video stream are separated on apicture-by-picture basis, and are stored in the payload of respectivePES packets. Each PES packet includes a PES header in which PTS(Presentation Time-Stamp) that represents display time of the pictureand DTS (Decoding Time-Stamp) that represents decoding time of thepicture are stored.

FIG. 46 illustrates the format of TS packets which are ultimatelywritten in multiplexed data. A TS packet is a 188-byte fixed-lengthpacket made up of a 4-byte TS header which includes information such asPID for identifying a stream, and a 184-byte TS payload which storesdata. A PES packet is divided into portions, and these portions arestored in respective TS payloads. In the case of BD-ROM, a TS packet isattached with a 4-byte TP Extra Header to form a 192-byte source packet,and the source packet is written in the multiplexed data. The TP ExtraHeader includes information such as ATS (Arrival_Time_Stamp). The ATSrepresents the transfer start time at which transfer of the TS packet toa PID filter of a decoder is to be started. As illustrated by the lowestrow in FIG. 46, source packets are arranged in the multiplexed data. Thenumber that is incremented from the start of the multiplexed data iscalled SPN (Source Packet Number).

TS packets included in the multiplexed data include a PAT (ProgramAssociation Table), a PMT (Program Map Table), and a PCR (Program ClockReference) in addition to individual streams of video, audio, subtitle,and so forth. The PAT represents the PID of the PMT used in themultiplexed data, and 0 is registered as the PID of the PAT. The PMTincludes PIDs of individual streams of video, audio, subtitle, and soforth included in the multiplexed data; pieces of attribute informationof the streams corresponding to the individual PIDs; and variousdescriptors regarding the multiplexed data. Examples of the descriptorsinclude copy control information that indicates whether or not copyingof the multiplexed data is permitted. The PCR includes informationregarding STC (System Time Clock) time corresponding to the ATS at whichthe PCR packet is transferred to a decoder in order to achievesynchronization between ATC (Arrival Time Clock) which is the time axisfor ATS and STC (System Time Clock) which is the time axis for PTS andDTS.

FIG. 47 is a view illustrating the detailed data structure of the PMT.At the start of the PMT, a PMT header which describes the length of dataincluded in the PMT is placed. The PMT header is followed by a pluralityof descriptors regarding the multiplexed data. The copy controlinformation and so forth are described as the descriptors. Thedescriptors are followed by a plurality of pieces of stream informationregarding individual streams included in the multiplexed data. Thestream information is made up of a stream type for identifying thecompression codec of the stream or the like, the PID of the stream, andstream descriptors that describe the attribute information (such as aframe rate and an aspect ratio) of the stream. The PMT includes as manystream descriptors as the number of streams included in the multiplexeddata.

In the case where the multiplexed data is recorded on a recording mediumor the like, the multiplexed data is recorded together with amultiplexed data information file.

As illustrated in FIG. 48, a multiplexed data information file (clipinformation file) contains management information of the multiplexeddata, has one-to-one correspondence with the multiplexed data, and ismade up of multiplexed data information (clip information), streamattribute information, and an entry map.

The multiplexed data information (clip information) is made up of thesystem rate, the playback start time, and the playback end time asillustrated in FIG. 48. The system rate represents the maximum transferrate at which the multiplexed data is transferred to the PID filter of asystem target decoder (described later). Intervals of the ATS includedin the multiplexed data are set to be lower than or equal to the systemrate. The playback start time represents the PTS of the first videoframe of the multiplexed data. As the playback end time, a resultobtained by adding a playback duration of one frame to the PTS of thelast video frame of the multiplexed data is set.

For each PID, attribute information of a corresponding stream includedin the multiplexed data is registered in the stream attributeinformation as illustrated in FIG. 49. The attribute information hasdifferent pieces of information for the video stream, the audio stream,the presentation graphics stream, and the interactive graphics stream.Video stream attribute information includes pieces of information suchas those regarding a compression codec used to compress the videostream, a resolution of individual picture data of the video stream, anaspect ratio, and a frame rate. Audio stream attribute informationincludes pieces of information such as those regarding a compressioncodec used to compress the audio stream, the number of channels includedin the audio stream, a supported language, and a sampling frequency.These pieces of information is used in the initialization of the decoderbefore a player performs the playback.

In the fifth exemplary embodiment, the stream type contained in the PMTis used among the multiplexed data. Also, in the case where themultiplexed data is recorded on a recording medium, the video streamattribute information contained in the multiplexed data information isused. Specifically, the video coding method or apparatus described ineach of the above exemplary embodiments includes a step or unit forsetting unique information which indicates whether the video data hasbeen generated by the video coding method or apparatus described in eachof the above exemplary embodiments, in the stream type contained in thePMT or the video stream attribute information. Therefore, the video datagenerated using the video coding method or apparatus described in eachof the above exemplary embodiments and video data based on anotherstandard can be distinguished from each other.

FIG. 50 illustrates steps included in a video decoding method of thefifth exemplary embodiment. In Step exS100, the stream type contained inthe PMT or the video stream attribute information contained in themultiplexed data information is obtained from the multiplexed data.Then, in Step exS101, it is determined whether or not the stream type orthe video stream attribute information indicates that the multiplexeddata is data that is generated by the video coding method or apparatusdescribed in each of the above exemplary embodiments. When it isdetermined from the stream type or the video stream attributeinformation that the multiplexed data is generated by the video codingmethod or apparatus described in each of the above exemplaryembodiments, the decoding is performed by the video decoding methoddescribed in each of the above exemplary embodiments in Step exS102.When the stream type or the video stream attribute information indicatesthat the multiplexed data is pursuant to existing standards such asMPEG-2, MPEG-4 AVC, and VC-1, the decoding is performed by the videodecoding method pursuant to the existing standard in Step exS103.

By setting a new unique value in the stream type or the video streamattribute information in this way, it can be determined whether or notdecoding can be performed using the video decoding method or apparatusdescribed in each of the above exemplary embodiments at the time ofdecoding. Accordingly, even in the case where multiplexed data based ona different standard is input, an appropriate decoding method orapparatus can be selected, and thus decoding can be performed withoutcausing an error. Also, the video coding method or apparatus or thevideo decoding method or apparatus described in the fifth exemplaryembodiment is applicable to any of the aforementioned devices andsystems.

Sixth Exemplary Embodiment

The video coding method and apparatus and the video decoding method andapparatus described in each of the above exemplary embodiments aretypically implemented using an LSI which is an integrated circuit. FIG.51 illustrates an example of a configuration of LSI ex500 that is formedas one chip. LSI ex500 includes controller ex501, CPU ex502, memorycontroller ex503, stream controller ex504, power supply circuit unitex505, stream input/output (I/O) ex506, signal processor ex507, bufferex508, and audio/video (AV) I/O ex509, which are connected to oneanother through bus ex510. Upon power-on, power supply circuit unitex505 supplies electric power to the individual units to activate theindividual units into an operable state.

For example, in the case of performing a coding process, LSI ex500receives an AV signal from microphone ex117, camera ex113, or the likethrough AV I/O ex509 in accordance with control performed by controllerex501 which includes CPU ex502, memory controller ex503, streamcontroller ex504, and driving frequency controller ex512. The input AVsignal is temporarily stored in external memory ex511, such as an SDRAM(Synchronous Dynamic Random Access Memory). In accordance with controlperformed by controller ex501, the stored data is divided into aplurality of portions in accordance with a quantity of processing or aprocessing speed, and the plurality of portions are sent to signalprocessor ex507. Then, signal processor ex507 codes the audio signaland/or the video signal. The coding process performed on the videosignal here is the coding process described in each of the aboveexemplary embodiments. Signal processor ex507 performs processing suchas multiplexing of the coded audio data and the coded video datadepending on circumstances, and outputs the multiplexed data to outsidethrough stream I/O ex506. This output multiplexed data is transmitted tobase station ex107 or written to recording medium ex215. Note that theaudio data and the video data may be temporarily stored in buffer ex508at the time of multiplexing so that these pieces of data aresynchronized with each other.

Although memory ex511 has been described as a device provided outsideLSI ex500 above, memory ex511 may be included in LSI ex500. The numberof buffers ex508 is not limited to one, but a plurality of buffers maybe provided. LSI ex500 may be formed in one chip or a plurality ofchips.

Although controller ex501 includes CPU ex502, memory controller ex503,stream controller ex504, and driving frequency controller ex512, theconfiguration of controller ex501 is not limited to this one. Forexample, signal processor ex507 may further include a CPU. A CPU is alsoprovided in signal processor ex507, which allows the processing speed tobe further enhanced. Alternatively, CPU ex502 may include signalprocessor ex507 or, for example, an audio signal processor which is partof signal processor ex507. In such a case, controller ex501 includes CPUex502 which includes signal processor ex507 or part of signal processorex507.

Note that the term “LSI” is used here; however, the configuration may bereferred to as an IC, a system LSI, a super LSI, or an ultra LSIdepending on the degree of integration.

Also, the circuit integration technique is not limited to LSI, andcircuit integration may be implemented using a dedicated circuit orgeneral-purpose processor. An FPGA (Field Programmable Gate Array) thatis programmable after manufacturing of an LSI or a reconfigurableprocessor in which connections and settings of circuit cells within theLSI are reconfigurable may be used. Such a programmable logic device canexecute the video coding method or the video decoding method describedin each of the above exemplary embodiments typically by loading orreading from a memory or the like a program constituting software orfirmware.

When an advance in the semiconductor technology or another relatedtechnology yields a circuit integration technology that may substitutefor LSI, the functional blocks may be integrated using such a technologyobviously. Adaptation of the biotechnology may be possible.

Seventh Exemplary Embodiment

It is considered that a quantity of processing increases in the case ofdecoding video data generated using the video coding method or apparatusdescribed in each of the above exemplary embodiments, compared with thecase of decoding video data pursuant to existing standards such asMPEG-2, MPEG-4 AVC, and VC-1. Accordingly, in LSI ex500, a higherdriving frequency needs to be set in CPU ex502 than that used when videodata based on an existing standard is decoded. However, making thedriving frequency higher undesirably increases power consumption.

To address this issue, the video decoding apparatus, such as televisionex300 or LSI ex500, is configured to identify a standard which videodata is based on, and to switch between the driving frequencies inaccordance with the standard. FIG. 52 illustrates configuration ex800according to a seventh exemplary embodiment. Driving frequency switchingunit ex803 sets the driving frequency high in the case where video datais data that has been generated using the video coding method orapparatus described in each of the above exemplary embodiments. Drivingfrequency switching unit ex803 also instructs decoding processor ex801which executes the video decoding method described in each of the aboveexemplary embodiments to decode the video data. On the other hand, inthe case where the video data is data based on an existing standard,driving frequency switching unit ex803 sets the driving frequency lowerthan that of the case where the video data is data that is generated bythe video coding method or apparatus described in each of the aboveexemplary embodiments. Then, driving frequency switching unit ex803instructs decoding processor ex802 pursuant to the existing standard todecode the video data.

More specifically, driving frequency switching unit ex803 includes CPUex502 and driving frequency controller ex512 illustrated in FIG. 51.Decoding processor ex801 that executes the video decoding methoddescribed in each of the above exemplary embodiments and decodingprocessor ex802 pursuant to an existing standard correspond to signalprocessor ex507 illustrated in FIG. 51. CPU ex502 identifies whichstandard the video data is pursuant to. Based on a signal from CPUex502, driving frequency controller ex512 sets the driving frequency.Also, based on a signal from CPU ex502, signal processor ex507 decodesthe video data. At this point, for example, it is conceivable that theuse of the identification information described in the fifth exemplaryembodiment in identification of the video data is conceivable. Theidentification information is not limited to the one described in thefifth exemplary embodiment and may be any type of information with whicha standard which the video data is based on is identifiable. Forexample, in the case where a standard which video data is based on isidentifiable on the basis of an external signal that identifies whetherthe video data is used for the television or for a disc, theidentification can be made on the basis of such an external signal. Forexample, it is also conceivable to select the driving frequency of CPUex502 in accordance with a lookup table in which the standard for thevideo data and the driving frequency are associated with each other asillustrated in FIG. 54. The lookup table is stored in buffer ex508 or aninternal memory of LSI ex500, and CPU ex502 refers to this lookup table,so that the driving frequency can be selected.

FIG. 53 illustrates steps of performing the method of the seventhexemplary embodiment. In Step exS200, signal processor ex507 obtainsidentification information from multiplexed data. In Step exS201, basedon the identification information, CPU ex502 identifies whether or notvideo data is video data that has been generated using the video codingmethod or apparatus described in each of the above exemplaryembodiments. If the video data is video data that has been generatedusing the video coding method or apparatus described in each of theabove exemplary embodiments, CPU ex502 sends a signal for setting a highdriving frequency to driving frequency controller ex512 in Step exS202.Then, driving frequency controller ex512 sets a high driving frequency.On the other hand, if the identification information indicates that thevideo data is video data pursuant to existing standards such as MPEG-2,MPEG-4 AVC, and VC-1, CPU ex502 sends a signal for setting a low drivingfrequency to driving frequency controller ex512 in Step exS203. Then,driving frequency controller ex512 sets a lower driving frequency thanthat used when the video data is video data that has been generatedusing the video coding method or apparatus described in each of theabove exemplary embodiments.

Further, by changing a voltage supplied to LSI ex500 or an apparatusincluding LSI ex500 in conjunction with switching of the drivingfrequency, the power-saving effect can be further increased. Forexample, it is conceivable that in the case where a low drivingfrequency is set, a voltage supplied to LSI ex500 or an apparatusincluding LSI ex500 is set to be lower in response to this setting thanthat of the case where a high driving frequency is set.

It is sufficient that the driving frequency is set to be higher in thecase where a quantity of decoding processing is large and set to belower in the case where a quantity of decoding processing is small.Accordingly, the driving frequency setting method is not limited to theabove-described setting method. For example, in the case that aprocessing quantity for decoding the video data pursuant to MPEG-4 AVCis larger than a processing quantity for decoding the video datagenerated by the video coding method or apparatus described in each ofthe above exemplary embodiments, the settings of the driving frequenciescan be made opposite to the settings of the above-described case.

The driving frequency setting method is not limited to a configurationfor setting the driving frequency low. For example, in the case wherethe identification information indicates that the video data is videodata that has been generated using the video coding method or apparatusdescribed in each of the above exemplary embodiments, a voltage suppliedto LSI ex500 or an apparatus including LSI ex500 may be set to be high.In the case where the identification information indicates that thevideo data is video data pursuant to existing standards such as MPEG-2,MPEG-4 AVC, and VC-1, a voltage supplied to LSI ex500 or an apparatusincluding LSI ex500 may be set to be low. Alternatively, in anotherexample, in the case where the identification information indicates thatthe video data is video data that has been generated using the videocoding method or apparatus described in each of the above exemplaryembodiments, driving of CPU ex502 is not stopped. In the case where theidentification information indicates that the video data is video datapursuant to existing standards such as MPEG-2, MPEG-4 AVC, and VC-1,driving of CPU ex502 may be temporarily stopped because there is asurplus of capacity relative to the processing load. When there is asurplus of capacity relative to the processing load in the case wherethe identification information indicates that the video data is videodata that has been generated using the video coding method or apparatusdescribed in each of the above exemplary embodiments, driving of CPUex502 may be temporarily stopped. In this case, a period over which CPUex502 is stopped may be set to be shorter than that of the case wherethe identification information indicates that the video data is videodata pursuant to existing standards such as MPEG-2, MPEG-4 AVC, andVC-1.

By switching between the driving frequencies in accordance with thestandard which the video data is based on in this manner, electric powercan be saved. Also, in the case where LSI ex500 or an apparatusincluding LSI ex500 is driven with a battery, the battery can be madelast longer as a result of power-saving.

Eighth Exemplary Embodiment

A plurality of pieces of video data based on different standards aresometimes input to the aforementioned devices and systems, such astelevision ex300 and mobile phone ex114. In order to enable decodingeven in the case where a plurality of pieces of video data based ondifferent standards are input, signal processor ex507 of LSI ex500 needsto support the plurality of standards. However, the use of signalprocessors ex507 for the respective standards undesirably makes thecircuit scale of LSI ex500 larger and increases the cost.

To address this issue, a decoding processor that executes the videodecoding method described in each of the above exemplary embodiments anda decoding processor pursuant to existing standards such as MPEG-2,MPEG-4 AVC, and VC-1, share some of their components. FIG. 55Aillustrates an example of configuration ex900. For example, the videodecoding method described in each of the above exemplary embodiments andthe video decoding method pursuant to MPEG-4 AVC share some processingcontents such as entropy decoding, inverse quantization, deblockingfiltering, and motion compensation. Accordingly, the followingconfiguration is conceivable. As to the shared processing contents,decoding processor ex902 pursuant to MPEG-4 AVC is shared. As to otherprocessing contents that are not pursuant to MPEG-4 AVC but unique to anaspect of the present disclosure, dedicated decoding processor ex901 maybe used. In particular, an aspect of the present disclosure includes afeature in motion compensation. Thus, for example, dedicated decodingprocessor ex901 may be used for motion compensation and decodingprocessor ex902 may be used in common for any of or all of inversequantization, entropy decoding, and deblocking filtering. Alternatively,as for sharing of the decoding processor, a configuration may be used inwhich a decoding processor that executes the video decoding methoddescribed in each of the above exemplary embodiments is used for thecommon processing contents and a dedicated decoding processor is usedfor processing contents unique to MPEG-4 AVC.

FIG. 55B illustrates another example ex1000 that implements sharing ofpart of processing. In this example, dedicated decoding processor ex1001that handles processing contents unique to an aspect of the presentdisclosure, dedicated decoding processor ex1002 that handles processingcontents unique to an existing standard, and shared decoding processorex1003 that handles processing contents that are common to the videodecoding method according to the aspect of the present disclosure andthe video decoding method according to the existing standard are used.At this point, dedicated decoding processors ex1001 and ex1002 are notnecessarily specialized for the processing contents unique to the aspectof the present disclosure and the existing standard, respectively, andmay be also capable of executing other general processing. Also, theconfiguration according to the eighth exemplary embodiment can beimplemented using LSI ex500.

By sharing a decoding processor for processing contents that are commonto the video decoding method according to an aspect of the presentdisclosure and the video decoding method according to an existingstandard, the circuit scale and cost of LSI ex500 can be reduced.

The present disclosure can be applied to an image processing device, animage capturing device, and an image playback device. Specifically, forexample, the present disclosure can be applied to a digital stillcamera, a digital movie camera, a camera-equipped mobile phone, and asmartphone.

What is claimed is:
 1. A prediction image generation method forgenerating a prediction image of a current block, the prediction imagegeneration method comprising: an extraction step of extracting aplurality of first feature points each of which has a local featurequantity, the plurality of first feature points being included in areconstructed image; a search step of searching a corresponding pointfrom the plurality of first feature points, the corresponding pointhaving a local feature quantity similar to a local feature quantity of asecond feature point corresponding to the current block; and ageneration step of generating the prediction image from thereconstructed image based on a relationship between the correspondingpoint and the second feature point being expressed by informationincluding a non-parallel translation component.
 2. The prediction imagegeneration method according to claim 1, wherein the second feature pointis included in the current block, and in the generation step, theprediction image is generated using a pixel value of a region includingthe corresponding point in the reconstructed image.
 3. The predictionimage generation method according to claim 1, wherein the second featurepoint is a feature point in surroundings of the current block, and inthe generation step, the prediction image is generated using a pixelvalue of a region that does not include the corresponding point in thereconstructed image.
 4. The prediction image generation method accordingto claim 3, wherein the reconstructed image is a reconstructed image ofa current picture including the current block.
 5. The prediction imagegeneration method according to claim 3, wherein the reconstructed imageis a reconstructed image of a picture different from a current pictureincluding the current block.
 6. An image coding method in which theprediction image generation method according to claim 5 is performed,the image coding method comprising: an image coding step of coding thecurrent block using the prediction image.
 7. The image coding methodaccording to claim 6, further comprising: a feature point informationcoding step of coding feature point information identifying the secondfeature point in a plurality of third feature points corresponding tothe current block, wherein the plurality of third feature points areextracted in the extraction step, and the second feature point isselected from the plurality of third feature points in the search step.8. The image coding method according to claim 7, wherein the featurepoint information indicates a coordinate of the second feature point. 9.The image coding method according to claim 7, wherein the feature pointinformation indicates a rotation quantity or a scale value which ispossessed by the second feature point.
 10. The image coding methodaccording to claim 9, further comprising: a corresponding pointinformation coding step of coding corresponding point informationidentifying the corresponding point in the plurality of first featurepoints.
 11. The image coding method according to claim 10, wherein thecorresponding point information indicates a coordinate of the secondfeature point.
 12. The image coding method according to claim 10,wherein, in the feature point information coding step, indexes areallocated to the plurality of first feature points in a predeterminedsequence, and the corresponding point information indicates the indexallocated to the corresponding point.
 13. The image coding methodaccording to claim 6, wherein, in the generation step, an initial valueof a motion estimation process is set based on the relationship, and theprediction image is generated by performing the motion estimationprocess using the initial value.
 14. An image decoding method in whichthe prediction image generation method according to claim 5 isperformed, the image decoding method comprising: an image decoding stepof decoding the current block using the prediction image.
 15. The imagedecoding method according to claim 14, further comprising: a featurepoint information decoding step of decoding feature point informationidentifying the second feature point in a plurality of third featurepoints corresponding to the current block, wherein the plurality ofthird feature points are extracted in the extraction step, and thesecond feature point is selected from the plurality of third featurepoints using the feature point information in the search step.
 16. Theimage decoding method according to claim 15, wherein the feature pointinformation indicates a coordinate of the second feature point.
 17. Theimage decoding method according to claim 15, wherein the feature pointinformation indicates a rotation quantity or a scale value which ispossessed by the second feature point.
 18. The image decoding methodaccording to claim 17, further comprising: a corresponding pointinformation decoding step of decoding corresponding point informationidentifying the corresponding point in the plurality of first featurepoints, wherein the corresponding point is searched from the pluralityof first feature points using the corresponding point information in thesearch step.
 19. The image decoding method according to claim 18,wherein, in the feature point information decoding step, indexes areallocated to the plurality of first feature points in a predeterminedsequence, and the corresponding point information indicates the indexallocated to the corresponding point.
 20. A prediction image generationapparatus that generates a prediction image of a current block, theprediction image generation apparatus comprising: an extraction unitthat extracts a plurality of first feature points each of which has alocal feature quantity, the plurality of first feature points beingincluded in a reconstructed image; a search unit that searches acorresponding point from the plurality of first feature points, thecorresponding point having a local feature quantity similar to a localfeature quantity of a second feature point corresponding to the currentblock; and a generation unit that generates the prediction image fromthe reconstructed image based on a relationship between thecorresponding point and the second feature point being expressed byinformation including a non-parallel translation component.